[
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504323#comment-14504323
]
Vinayakumar B commented on HDFS-7993:
-------------------------------------
Hi andreina,
Patch looks almost good with the updated test.
One update required as mentioned in prev comment.
bq. Could the test fail if the node becomes decommissioned right after it
checks isDecommissionInProgress? Otherwise, it looks good.
yes, you are right [~mingma], Since there are 2 DNs already available, by the
time fsck executed and seen, decommissioned DN might be moved to DECOMMISSIONED
soon.
To slow it down, I recommend to start only one node at the beginning of
cluster. And once the DECOMMISSIONING state is verified in fsck, start another
datanode and verify for the DECOMMISSIONED.
Few more nits to be fixed in test.
1. Unnecessary assertion {{+ assertNotNull("Failed Cluster Creation",
cluster);}}, as if building fails, then it will throw exception directly.
2. For the current usage of DFSTestUtil, need not build it using Builder.
directly can use static methods.
{code}+ DFSTestUtil util =
+ new
DFSTestUtil.Builder().setName(getClass().getSimpleName()).setNumFiles(1).build();
{code}
3. {{+ int count = 0;}} is not used. Either this could should be used in
while loop as a condition. Also I recommend adding @Timeout annotation to test.
> Incorrect descriptions in fsck when nodes are decommissioned
> ------------------------------------------------------------
>
> Key: HDFS-7993
> URL: https://issues.apache.org/jira/browse/HDFS-7993
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Ming Ma
> Assignee: J.Andreina
> Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch,
> HDFS-7993.4.patch, HDFS-7993.5.patch
>
>
> When you run fsck with "-files" or "-racks", you will get something like
> below if one of the replicas is decommissioned.
> {noformat}
> blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
> {noformat}
> That is because in NamenodeFsck, the repl count comes from live replicas
> count; while the actual nodes come from LocatedBlock which include
> decommissioned nodes.
> Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement
> verifies LocatedBlock that includes decommissioned nodes. However, it seems
> better to exclude the decommissioned nodes in the verification; just like how
> fsck excludes decommissioned nodes when it check for under replicated blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)