[ 
https://issues.apache.org/jira/browse/HDFS-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881983#action_12881983
 ] 

Rodrigo Schmidt commented on HDFS-1111:
---------------------------------------

That's the standard fsck call. 

The -list-corruptfiles makes the getCorruptFiles() call to the namenode. This 
call eventually looks at the needed_replication queue, which doesn't know about 
directories, paths, or files. It picks up all blocks that are in the "missing" 
queue, finds their INodes, gets their paths, and truncates the output according 
to the limit to be returned. The filter is applied only after the 
getCorruptFiles() call returns.

I think it's possible to change the API to pass the specific path you are 
interested in. I just created HDFS-1265 for that.

However, the truncation problem addressed by this JIRA will remain. I think 
it's good for users to know whether the list is complete or now.

> getCorruptFiles() should give some hint that the list is not complete
> ---------------------------------------------------------------------
>
>                 Key: HDFS-1111
>                 URL: https://issues.apache.org/jira/browse/HDFS-1111
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Rodrigo Schmidt
>            Assignee: Rodrigo Schmidt
>         Attachments: HADFS-1111.0.patch
>
>
> If the list of corruptfiles returned by the namenode doesn't say anything if 
> the number of corrupted files is larger than the call output limit (which 
> means the list is not complete). There should be a way to hint incompleteness 
> to clients.
> A simple hack would be to add an extra entry to the array returned with the 
> value null. Clients could interpret this as a sign that there are other 
> corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more 
> confident when the list is not complete and less confident when the list is 
> known to be incomplete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to