[ 
https://issues.apache.org/jira/browse/HDFS-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671118#comment-16671118
 ] 

Xiao Chen commented on HDFS-12946:
----------------------------------

Thanks for the patch Kitti! I think we're getting close.

Some review comments:
- FSNamesystem: We generally needs fsn/fsd locks when accessing internal 
states. In this case, I think DNManager is fine, but ECPManager should be 
protected with a readlock.
- ErasureCodingClusterSetupVerifier: I think we should finer extract the logic. 
In NN, we don't need to loop through the datanodes to get the number of racks - 
we can just get from {{NetworkTopology}} (e.g. via DNManager). IMO the 'highly 
uneven rack' check feels like something we can do as a future improvement. It's 
more subjective, and problem will be visible EC'ed or not.
- Following the above, there would be no need to {{reportSet.toArray}} in NN. 
With thousands of DNs in a cluster, this could be perf-heavy.
- EcClusterSetupVerifyResult: Private class doesn't have to define a 
{{InterfaceStability}}. They're by default Unstable.
- Naming: feels like {{ErasureCodingClusterSetupVerifier}} is a bit long. How 
about {{ECTopologyVerifier}}? We can assume EC is a known concept since this is 
HDFS private class. If it confused future developers, class javadoc should make 
it fairly clear. Similar to the method names. For example 
{{getVerifyClusterSetupSupportsEnabledEcPoliciesResult}} I think 
{{getECTopologyVerifierResult}} should be ok, or even {{verifyECWithTopology}}.
- There's an unnecessary change in {{ECBlockGroupsMBean}}

> Add a tool to check rack configuration against EC policies
> ----------------------------------------------------------
>
>                 Key: HDFS-12946
>                 URL: https://issues.apache.org/jira/browse/HDFS-12946
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: erasure-coding
>            Reporter: Xiao Chen
>            Assignee: Kitti Nanasi
>            Priority: Major
>         Attachments: HDFS-12946.01.patch, HDFS-12946.02.patch, 
> HDFS-12946.03.patch, HDFS-12946.04.fsck.patch, HDFS-12946.05.patch
>
>
> From testing we have seen setups with problematic racks / datanodes that 
> would not suffice basic EC usages. These are usually found out only after the 
> tests failed.
> We should provide a way to check this beforehand.
> Some scenarios:
> - not enough datanodes compared to EC policy's highest data+parity number
> - not enough racks to satisfy BPPRackFaultTolerant
> - highly uneven racks to satisfy BPPRackFaultTolerant
> - highly uneven racks (so that BPP's considerLoad logic may exclude some busy 
> nodes on the rack, resulting in #2)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to