[ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981973#action_12981973
 ] 

Todd Lipcon commented on HDFS-1547:
-----------------------------------

- DECOM_COMPARATOR should probably have some javadoc, it's not obvious from the 
name what it does. (why is there no sort order distinction for 
decommission_ing_ nodes, but just decommissioned ones? I thought this patch 
wanted to make decommissioning nodes sort lower for block locations also?)

Maybe a better name would be DECOMMISSIONED_AT_END_COMPARATOR or something? 
It's a bit long but not often used and clearer what it does.

- spurious whitespace change on setDatanodeDead() function and javadoc for 
handleHeartbeat

- in generateNodesList, the word decommissioned is misspelled at one point with 
too few 's'es
- in MiniDFSCluster.setupDatanodeAddress, you can use conf.getTrimmed instead 
of manually calling trim()
- the getFreeSocketPort() trick seems like it's not likely to work repeatably - 
isn't there a high likelihood that two datanodes would pick the same free port, 
since you don't track "claimed" ports anywhere? Or that one of these ports 
might later get claimed by one of the many other daemons running on ephemeral 
ports in a mini cluster?
- when the MiniDFS cluster is constructed, shouldn't you clear out the 
dfs.hosts file? Otherwise you're relying on the test case itself to clean 
itself up between runs (which differs from the rest of minidfs's storage 
handling)
- in the test case verifyStats method, it seems we should sleep for at least 
some number of millis, or write a function which will wait for heartbeats (eg 
like TestDatanodeRegistration.java:62). Otherwise the 10 quick iterations might 
run before any heartbeats actually came in.
- is there a test case anywhere that covers what happens when a decom node 
connects to the namenode? eg after a NN restart when a node is in both include 
and decom?

> Improve decommission mechanism
> ------------------------------
>
>                 Key: HDFS-1547
>                 URL: https://issues.apache.org/jira/browse/HDFS-1547
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, 
> HDFS-1547.patch, show-stats-broken.txt
>
>
> Current decommission mechanism driven using exclude file has several issues. 
> This bug proposes some changes in the mechanism for better manageability. See 
> the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to