[ 
https://issues.apache.org/jira/browse/HDFS-729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833658#action_12833658
 ] 

Rodrigo Schmidt commented on HDFS-729:
--------------------------------------

I was looking at the current patch and I think I found a bug on it.

On UnderReplicatedBlocks.java, the following method was added:

+  /**
+   * Return an iterator of all blocks that have no valid replicas.
+   * These are either blocks with no replicas or all existing replicas
+   * are corrupted. Such blocks are at level 2.
+   */
+  public synchronized Iterator<Block> iteratorBadBlocks() {
+    return priorityQueues.get(2).iterator();
+  }

It assumes all blocks on queue 2 have 0 replicas. However according to get 
getPriority() on the same source file, we can see that level 2 is also used for 
blocks whose number of replicas times 3 is bigger than expected replicas:

    } else if(curReplicas*3<expectedReplicas) {
      return 1;
    } else {
      return 2;
    }

So, if a block has 2 replicas, but it is expected to have 3, it will also be 
kept in the queue with priority 2. I'm fixing that by adding an extra check on 
the real number of replicas a block has before adding it to the list returned 
by BlockManager.getCorruptFiles() (previously BockManager.getBadFiles()). 
Please let me know what you guys think about it.



> fsck option to list only corrupted files
> ----------------------------------------
>
>                 Key: HDFS-729
>                 URL: https://issues.apache.org/jira/browse/HDFS-729
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: badFiles.txt, badFiles2.txt, corruptFiles.txt
>
>
> An option to fsck to list only corrupted files will be very helpful for 
> frequent monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to