[ 
https://issues.apache.org/jira/browse/HDFS-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652145#comment-16652145
 ] 

Xiao Chen commented on HDFS-12946:
----------------------------------

Thanks [~andrew.wang] for the comment! 

My version of recap: There's no existing way to check - Patch 1 was doing this 
on client-side entirely. But as you said this can't be easily used during EC 
policy enabling call. Moving this logic to NN-side seems to be a good reuse. 
Otherwise even if we extract the logic to some util functions, ecadmin would 
still need to call all these RPCs.

Seems like we're left with 2 options here:
# Do this client-side, extract the logic and accept the fact that enablePolicy 
may call other RPCs for validation. (If I understand Andrew's "this would be a 
more generally useful admin interface" comment correctly.
# Do it via this new RPC. We can work on details to make the return value more 
reasonable (e.g. enum-up the int return value; on MXBean, return a String which 
is built based on the int/enum value).

I'm voting on #2 because I think exposing this via metrics is more flexible and 
usable by various types of downstream. Thoughts?

> Add a tool to check rack configuration against EC policies
> ----------------------------------------------------------
>
>                 Key: HDFS-12946
>                 URL: https://issues.apache.org/jira/browse/HDFS-12946
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: erasure-coding
>            Reporter: Xiao Chen
>            Assignee: Kitti Nanasi
>            Priority: Major
>         Attachments: HDFS-12946.01.patch, HDFS-12946.02.patch, 
> HDFS-12946.03.patch, HDFS-12946.04.fsck.patch
>
>
> From testing we have seen setups with problematic racks / datanodes that 
> would not suffice basic EC usages. These are usually found out only after the 
> tests failed.
> We should provide a way to check this beforehand.
> Some scenarios:
> - not enough datanodes compared to EC policy's highest data+parity number
> - not enough racks to satisfy BPPRackFaultTolerant
> - highly uneven racks to satisfy BPPRackFaultTolerant
> - highly uneven racks (so that BPP's considerLoad logic may exclude some busy 
> nodes on the rack, resulting in #2)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to