[ https://issues.apache.org/jira/browse/HDFS-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109166#comment-16109166 ]
Wei-Chiu Chuang commented on HDFS-12182: ---------------------------------------- Thanks for the new patch. I reviewed again and saw a few nits, plus additional comments unrelated to your patch: findbugs warnings are unrelated, caused by HDFS-11696. {code:title=TestBlockManager#testMetaSaveMissingReplicas} if (reader != null) { reader.close(); } {code} this check is not needed. If reader is ever a null pointer, it is likely caused by a failed initialization and it should have thrown an exception. The try {} block is after the initialization and so won’t catch it anyway. One typo: {code} assertTrue("Metasave output should had …”) {code} “had” —> “have” After the patch, the output of metaSave is: {noformat} Live Datanodes: 0 Dead Datanodes: 0 Metasave: Blocks waiting for reconstruction: 0 Metasave: Blocks currently missing: 1 file16387: blk_0_1 MISSING (replicas: l: 0 d: 0 c: 2 e: 0) 1.1.1.1:9866(corrupt) (block deletions maybe out of date) : 2.2.2.2:9866(corrupt) (block deletions maybe out of date) : Mis-replicated blocks that have been postponed: Metasave: Blocks being reconstructed: 0 Metasave: Blocks 0 waiting deletion from 0 datanodes. Corrupt Blocks: Block=0 Node=1.1.1.1:9866 StorageID=s1 StorageState=NORMAL TotalReplicas=2 Reason=GENSTAMP_MISMATCH Block=0 Node=2.2.2.2:9866 StorageID=s2 StorageState=NORMAL TotalReplicas=2 Reason=GENSTAMP_MISMATCH Metasave: Number of datanodes: 0 {noformat} (the following is unrelated to this jira) Looking at the output The output is not user friendly — The meaning of “(replicas: l: 0 d: 0 c: 2 e: 0)” is not obvious without looking at the code. Also, it should print maintenance mode replicas. > BlockManager.metaSave does not distinguish between "under replicated" and > "missing" blocks > ------------------------------------------------------------------------------------------ > > Key: HDFS-12182 > URL: https://issues.apache.org/jira/browse/HDFS-12182 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Reporter: Wellington Chevreuil > Assignee: Wellington Chevreuil > Priority: Trivial > Labels: newbie > Fix For: 3.0.0-alpha3 > > Attachments: HDFS-12182.001.patch, HDFS-12182.002.patch, > HDFS-12182.003.patch > > > Currently, *BlockManager.metaSave* method (which is called by "-metasave" dfs > CLI command) reports both "under replicated" and "missing" blocks under same > metric *Metasave: Blocks waiting for reconstruction:* as shown on below code > snippet: > {noformat} > synchronized (neededReconstruction) { > out.println("Metasave: Blocks waiting for reconstruction: " > + neededReconstruction.size()); > for (Block block : neededReconstruction) { > dumpBlockMeta(block, out); > } > } > {noformat} > *neededReconstruction* is an instance of *LowRedundancyBlocks*, which > actually wraps 5 priority queues currently. 4 of these queues store different > under replicated scenarios, but the 5th one is dedicated for corrupt/missing > blocks. > Thus, metasave report may suggest some corrupt blocks are just under > replicated. This can be misleading for admins and operators trying to track > block missing/corruption issues, and/or other issues related to > *BlockManager* metrics. > I would like to propose a patch with trivial changes that would report > corrupt blocks separately. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org