[ https://issues.apache.org/jira/browse/HDFS-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671118#comment-16671118 ]
Xiao Chen commented on HDFS-12946: ---------------------------------- Thanks for the patch Kitti! I think we're getting close. Some review comments: - FSNamesystem: We generally needs fsn/fsd locks when accessing internal states. In this case, I think DNManager is fine, but ECPManager should be protected with a readlock. - ErasureCodingClusterSetupVerifier: I think we should finer extract the logic. In NN, we don't need to loop through the datanodes to get the number of racks - we can just get from {{NetworkTopology}} (e.g. via DNManager). IMO the 'highly uneven rack' check feels like something we can do as a future improvement. It's more subjective, and problem will be visible EC'ed or not. - Following the above, there would be no need to {{reportSet.toArray}} in NN. With thousands of DNs in a cluster, this could be perf-heavy. - EcClusterSetupVerifyResult: Private class doesn't have to define a {{InterfaceStability}}. They're by default Unstable. - Naming: feels like {{ErasureCodingClusterSetupVerifier}} is a bit long. How about {{ECTopologyVerifier}}? We can assume EC is a known concept since this is HDFS private class. If it confused future developers, class javadoc should make it fairly clear. Similar to the method names. For example {{getVerifyClusterSetupSupportsEnabledEcPoliciesResult}} I think {{getECTopologyVerifierResult}} should be ok, or even {{verifyECWithTopology}}. - There's an unnecessary change in {{ECBlockGroupsMBean}} > Add a tool to check rack configuration against EC policies > ---------------------------------------------------------- > > Key: HDFS-12946 > URL: https://issues.apache.org/jira/browse/HDFS-12946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding > Reporter: Xiao Chen > Assignee: Kitti Nanasi > Priority: Major > Attachments: HDFS-12946.01.patch, HDFS-12946.02.patch, > HDFS-12946.03.patch, HDFS-12946.04.fsck.patch, HDFS-12946.05.patch > > > From testing we have seen setups with problematic racks / datanodes that > would not suffice basic EC usages. These are usually found out only after the > tests failed. > We should provide a way to check this beforehand. > Some scenarios: > - not enough datanodes compared to EC policy's highest data+parity number > - not enough racks to satisfy BPPRackFaultTolerant > - highly uneven racks to satisfy BPPRackFaultTolerant > - highly uneven racks (so that BPP's considerLoad logic may exclude some busy > nodes on the rack, resulting in #2) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org