[ 
https://issues.apache.org/jira/browse/HDFS-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894214#action_12894214
 ] 

Konstantin Shvachko commented on HDFS-1111:
-------------------------------------------

I see there is a reference to my participation in HDFS-729, so there is nobody 
to blame but myself.

I think the lesson with list directories taught us some things. And it has the 
same issue: we do not guarantee that we list all directory entries as a single 
snapshot, because there could be too many of them. We only guarantee to return 
the current consequent list of N entries following the specified name. The rest 
may have changed by the time the list of N is displayed.

With Sriram's approach we actually list blocks of corrupted files and provide 
info about files they belong to. This is different from the previously 
discussed approach. 
- So I propose to rename the method and the respective fsck option to 
{{listCorruptFileBlocks}} instead of {{listCorruptFile}}.

The paging in Sriram's proposal is done by blockId. Since the blocks in the 
{{UnderReplicatedBlocks}} queues are ordered by blockId this will provide more 
natural paging semantics than "skip K and return the next N" - one of the 
variants considered before. Paging by blockId is in a sense the same as in list 
dirs. Fsck guarantees to return a consequent list of N corrupt blocks greater 
than the given id.

ClientProtocol changes. My point is that any new features included in the code 
need to be supported, which is not free. And supporting a feature which is not 
used by anybody is particularly inefficient and even frustrating, not that we 
don't have any of such already. 
RAID may be a good use case for this API, but I agree with Rodrigo it's a topic 
of different discussion and we should take it out of this issue. I sure do not 
have enough context, but may be RAID can query NN for corrupt blocks the same 
way fsck does.


> getCorruptFiles() should give some hint that the list is not complete
> ---------------------------------------------------------------------
>
>                 Key: HDFS-1111
>                 URL: https://issues.apache.org/jira/browse/HDFS-1111
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Rodrigo Schmidt
>            Assignee: Rodrigo Schmidt
>         Attachments: HADFS-1111.0.patch
>
>
> If the list of corruptfiles returned by the namenode doesn't say anything if 
> the number of corrupted files is larger than the call output limit (which 
> means the list is not complete). There should be a way to hint incompleteness 
> to clients.
> A simple hack would be to add an extra entry to the array returned with the 
> value null. Clients could interpret this as a sign that there are other 
> corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more 
> confident when the list is not complete and less confident when the list is 
> known to be incomplete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to