[ 
http://issues.apache.org/jira/browse/HADOOP-101?page=comments#action_12371689 ] 

Andrzej Bialecki  commented on HADOOP-101:
------------------------------------------

Wow, lots of comments, let me address some of them:

* re: locking. I also see this as an advantage, fsck can run in parallel with 
normal operations. If someone else deletes a file, no big deal - the name is 
removed from the namesystem, so if we suddenly detect missing blocks we could 
always check if a file with this name still exists in the namesystem.

* re: performance. Sure, we could parallelize this, which should speed things 
up (currently it's rather slow, checking ~1TB takes > 2 hours), but then it 
would put a higher load on the namenode. Perhaps we could make this an option, 
e.g. start a configurable pool of fsck threads in parallel.

* re: blocks not in use by any file. I think this is already handled internally 
by namenode<->datanode protocol (for good and for bad), i.e. namenode detects 
orphaned blocks and tells datanodes to remove them. See FSNamesystem:924 .

* handling the reverse situation (missing blocks in existing files) should be 
straightforward, with the use of /lost+found directory: for each corrupted file 
a directory would be created there, and remaining chains of consecutive blocks 
would be stored in that directory.

* re: checking blocks through streaming: +1, I like the concept, could you 
perhaps implement it? ;) Also, what happens if a mapred task tries to retrieve 
a missing/corrupted block? I think currently this hangs the task, due to a 
missing break in the while loop in DFSClient:354

> DFSck - fsck-like utility for checking DFS volumes
> --------------------------------------------------
>
>          Key: HADOOP-101
>          URL: http://issues.apache.org/jira/browse/HADOOP-101
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.2
>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>  Attachments: DFSck.java
>
> This is a utility to check health status of a DFS volume, and collect some 
> additional statistics.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to