[
https://issues.apache.org/jira/browse/HDFS-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064963#comment-13064963
]
John George commented on HDFS-2114:
-----------------------------------
Mat, Thanks a lot for reviewing this patch. I will post a newer patch tomorrow
with changes to checkFile(). I have tried to integrate your comments.
> 1. I know you didn't write TestDecommission.checkFile(), but your addition in
> this patch of the {{isNodeDown}} flag raises a number of questions:
> * Aren't (isNodeDown && nodes[j].getName().equals(downnode)) and
> (nodes[j].getName().equals(downnode)) always the same? So is the first use
> of
> {{isNodeDown}} even necessary?
The first use of {isNodeDown} is necessary because {downnode} could be null in
cases when we are checking for "Recommission".
> * Can there be any cases where (nodes[j].getName().equals(downnode)) and
> (nodes[j].isDecommissioned()) are different? So can't the following two
> blocks
> be merged? And the location of the "is decommissioned" log message further
> confuses this issue.
> {code}
> if (isNodeDown && nodes[j].getName().equals(downnode)) {
> hasdown++;
> LOG.info("Block " + blk.getBlock() + " replica " +
> nodes[j].getName()
> + " is decommissioned.");
> }
> if (nodes[j].isDecommissioned()) {
> if (firstDecomNodeIndex == -1) {
> firstDecomNodeIndex = j;
> }
> continue;
> }
> {code}
I think you are right. I will change the code to assert if they are not the
same.
> * What is the purpose of the assertion
> {code}
> if(isNodeDown)
> assertEquals("Decom node is not at the end", firstDecomNodeIndex,
> -1);
> {code}
My understanding is that this assert ensures that the current blk has only ONE
replica that is in decommissioned state.
> And why does it even work, since the node to decommission is chosen at random?
> * And in the same block, why is it important to condition it on
> {{isNodeDown}}, since (!isNodeDown) implies there shouldn't be any
> decommissioned nodes? So the second use of {{isNodeDown}} also seems
> unnecessary.
Because, we don't care for this in cases where we there is no node that is down.
>
> 2. Regarding the timeout: In TestDecommission, you appropriately set
> BLOCKREPORT_INTERVAL_MSEC down to 1 sec to match the HEARTBEAT_INTERVAL. You
> may also want to consider DFS_NAMENODE_REPLICATION_INTERVAL_KEY (default 3
> sec). Adjusting this value to 1 might allow testRecommission() to run in 5
> sec instead of 10.
>
Will try this.
> 3. I wish there were a clear way to share the duplicated code between
> testDecommission and testRecommission, but I couldn't see an obviously better
> choice. Can you think of a way to improve this?
ok
> re-commission of a decommissioned node does not delete excess replica
> ---------------------------------------------------------------------
>
> Key: HDFS-2114
> URL: https://issues.apache.org/jira/browse/HDFS-2114
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: John George
> Assignee: John George
> Attachments: HDFS-2114-2.patch, HDFS-2114-3.patch, HDFS-2114.patch
>
>
> If a decommissioned node is removed from the decommissioned list, namenode
> does not delete the excess replicas it created while the node was
> decommissioned.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira