[ http://issues.apache.org/jira/browse/HADOOP-101?page=comments#action_12371689 ]
Andrzej Bialecki commented on HADOOP-101: ------------------------------------------ Wow, lots of comments, let me address some of them: * re: locking. I also see this as an advantage, fsck can run in parallel with normal operations. If someone else deletes a file, no big deal - the name is removed from the namesystem, so if we suddenly detect missing blocks we could always check if a file with this name still exists in the namesystem. * re: performance. Sure, we could parallelize this, which should speed things up (currently it's rather slow, checking ~1TB takes > 2 hours), but then it would put a higher load on the namenode. Perhaps we could make this an option, e.g. start a configurable pool of fsck threads in parallel. * re: blocks not in use by any file. I think this is already handled internally by namenode<->datanode protocol (for good and for bad), i.e. namenode detects orphaned blocks and tells datanodes to remove them. See FSNamesystem:924 . * handling the reverse situation (missing blocks in existing files) should be straightforward, with the use of /lost+found directory: for each corrupted file a directory would be created there, and remaining chains of consecutive blocks would be stored in that directory. * re: checking blocks through streaming: +1, I like the concept, could you perhaps implement it? ;) Also, what happens if a mapred task tries to retrieve a missing/corrupted block? I think currently this hangs the task, due to a missing break in the while loop in DFSClient:354 > DFSck - fsck-like utility for checking DFS volumes > -------------------------------------------------- > > Key: HADOOP-101 > URL: http://issues.apache.org/jira/browse/HADOOP-101 > Project: Hadoop > Type: New Feature > Components: dfs > Versions: 0.2 > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Attachments: DFSck.java > > This is a utility to check health status of a DFS volume, and collect some > additional statistics. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
