[
https://issues.apache.org/jira/browse/HDFS-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109166#comment-16109166
]
Wei-Chiu Chuang edited comment on HDFS-12182 at 8/1/17 4:08 PM:
----------------------------------------------------------------
Thanks for the new patch.
I reviewed again and saw a few nits, plus additional comments unrelated to your
patch:
findbugs warnings are unrelated, caused by HDFS-11696.
{code:title=TestBlockManager#testMetaSaveMissingReplicas}
if (reader != null) {
reader.close();
}
{code}
this check is not needed. If reader is ever a null pointer, it is likely caused
by a failed initialization and it should have thrown an exception. The try {}
block is after the initialization and so won’t catch it anyway.
One typo:
{code}
assertTrue("Metasave output should had …”)
{code}
“had” —> “have”
Other than these nits the patch LGTM.
After the patch, the output of metaSave is:
{noformat}
Live Datanodes: 0
Dead Datanodes: 0
Metasave: Blocks waiting for reconstruction: 0
Metasave: Blocks currently missing: 1
file16387: blk_0_1 MISSING (replicas: l: 0 d: 0 c: 2 e: 0)
1.1.1.1:9866(corrupt) (block deletions maybe out of date) :
2.2.2.2:9866(corrupt) (block deletions maybe out of date) :
Mis-replicated blocks that have been postponed:
Metasave: Blocks being reconstructed: 0
Metasave: Blocks 0 waiting deletion from 0 datanodes.
Corrupt Blocks:
Block=0 Node=1.1.1.1:9866 StorageID=s1 StorageState=NORMAL
TotalReplicas=2 Reason=GENSTAMP_MISMATCH
Block=0 Node=2.2.2.2:9866 StorageID=s2 StorageState=NORMAL
TotalReplicas=2 Reason=GENSTAMP_MISMATCH
Metasave: Number of datanodes: 0
{noformat}
(the following is unrelated to this jira)
Looking at the output
The output is not user friendly — The meaning of “(replicas: l: 0 d: 0 c: 2 e:
0)” is not obvious without looking at the code.
Also, it should print maintenance mode replicas.
was (Author: jojochuang):
Thanks for the new patch.
I reviewed again and saw a few nits, plus additional comments unrelated to your
patch:
findbugs warnings are unrelated, caused by HDFS-11696.
{code:title=TestBlockManager#testMetaSaveMissingReplicas}
if (reader != null) {
reader.close();
}
{code}
this check is not needed. If reader is ever a null pointer, it is likely caused
by a failed initialization and it should have thrown an exception. The try {}
block is after the initialization and so won’t catch it anyway.
One typo:
{code}
assertTrue("Metasave output should had …”)
{code}
“had” —> “have”
After the patch, the output of metaSave is:
{noformat}
Live Datanodes: 0
Dead Datanodes: 0
Metasave: Blocks waiting for reconstruction: 0
Metasave: Blocks currently missing: 1
file16387: blk_0_1 MISSING (replicas: l: 0 d: 0 c: 2 e: 0)
1.1.1.1:9866(corrupt) (block deletions maybe out of date) :
2.2.2.2:9866(corrupt) (block deletions maybe out of date) :
Mis-replicated blocks that have been postponed:
Metasave: Blocks being reconstructed: 0
Metasave: Blocks 0 waiting deletion from 0 datanodes.
Corrupt Blocks:
Block=0 Node=1.1.1.1:9866 StorageID=s1 StorageState=NORMAL
TotalReplicas=2 Reason=GENSTAMP_MISMATCH
Block=0 Node=2.2.2.2:9866 StorageID=s2 StorageState=NORMAL
TotalReplicas=2 Reason=GENSTAMP_MISMATCH
Metasave: Number of datanodes: 0
{noformat}
(the following is unrelated to this jira)
Looking at the output
The output is not user friendly — The meaning of “(replicas: l: 0 d: 0 c: 2 e:
0)” is not obvious without looking at the code.
Also, it should print maintenance mode replicas.
> BlockManager.metaSave does not distinguish between "under replicated" and
> "missing" blocks
> ------------------------------------------------------------------------------------------
>
> Key: HDFS-12182
> URL: https://issues.apache.org/jira/browse/HDFS-12182
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Trivial
> Labels: newbie
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-12182.001.patch, HDFS-12182.002.patch,
> HDFS-12182.003.patch
>
>
> Currently, *BlockManager.metaSave* method (which is called by "-metasave" dfs
> CLI command) reports both "under replicated" and "missing" blocks under same
> metric *Metasave: Blocks waiting for reconstruction:* as shown on below code
> snippet:
> {noformat}
> synchronized (neededReconstruction) {
> out.println("Metasave: Blocks waiting for reconstruction: "
> + neededReconstruction.size());
> for (Block block : neededReconstruction) {
> dumpBlockMeta(block, out);
> }
> }
> {noformat}
> *neededReconstruction* is an instance of *LowRedundancyBlocks*, which
> actually wraps 5 priority queues currently. 4 of these queues store different
> under replicated scenarios, but the 5th one is dedicated for corrupt/missing
> blocks.
> Thus, metasave report may suggest some corrupt blocks are just under
> replicated. This can be misleading for admins and operators trying to track
> block missing/corruption issues, and/or other issues related to
> *BlockManager* metrics.
> I would like to propose a patch with trivial changes that would report
> corrupt blocks separately.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]