[
https://issues.apache.org/jira/browse/HDFS-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651092#comment-16651092
]
Xiao Chen commented on HDFS-12946:
----------------------------------
Thanks [~knanasi] for moving forward and providing patches for demonstration.
Good discussions and work. :)
I think the intuitive and most common way is via RPC, we have many similar
things querying NN. The difference in this case, is this isn't a direct NN
status, but more of a calculated result based on NN topology && dir's ec
policy. Not exposing it to dfsclient is a good idea, I think ECAdmin is enough
for this call.
fsck looks tidy in code change, but as you said could have usability
confusions. This is in a sense closer to the JMX idea, because the work is done
via servlet, hence bypassing the regular RPC.
It's a hard call: I feel this EC related command should be under ECAdmin, but
fsck implementation would be cleaner. It'd be nice if we can have this still in
ECAdmin, but calling fsck to do that, but that's definitely hackier, I don't
think we have done that before.
[~andrew.wang], any advice / preference?
> Add a tool to check rack configuration against EC policies
> ----------------------------------------------------------
>
> Key: HDFS-12946
> URL: https://issues.apache.org/jira/browse/HDFS-12946
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: erasure-coding
> Reporter: Xiao Chen
> Assignee: Kitti Nanasi
> Priority: Major
> Attachments: HDFS-12946.01.patch, HDFS-12946.02.patch,
> HDFS-12946.03.patch, HDFS-12946.04.fsck.patch
>
>
> From testing we have seen setups with problematic racks / datanodes that
> would not suffice basic EC usages. These are usually found out only after the
> tests failed.
> We should provide a way to check this beforehand.
> Some scenarios:
> - not enough datanodes compared to EC policy's highest data+parity number
> - not enough racks to satisfy BPPRackFaultTolerant
> - highly uneven racks to satisfy BPPRackFaultTolerant
> - highly uneven racks (so that BPP's considerLoad logic may exclude some busy
> nodes on the rack, resulting in #2)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]