[ 
http://issues.apache.org/jira/browse/HADOOP-101?page=comments#action_12371658 ] 

Yoram Arnon commented on HADOOP-101:
------------------------------------

This is a good and awaited feature, filed previously as bug hadoop-95. I vote 
to check it in, because as you say, it's much better than anything we have, and 
of critical importance.

Regarding performance, clearly the nameserver will not be overwhelmed, but the 
operation may take a very long time to execute. It's one thing to traverse a 
million entries in memory (for a modest 32TB FS), but another matter to execute 
a hundred thousand RPC calls from a single client. Also, when we change the 
open command to not return the entire list of blocks, in the interest of 
shortening the time of opening a file, especially when reading just a few 
blocks from a very large file, the implementation will need to change.

Lastly, there's extensibility. We'll want to test for things that are available 
only on the name server, like blocks that are not used by any file.

Wouldn't it be better to request the server to execute this code internally, 
and report results either to the client or to a local file?

> DFSck - fsck-like utility for checking DFS volumes
> --------------------------------------------------
>
>          Key: HADOOP-101
>          URL: http://issues.apache.org/jira/browse/HADOOP-101
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.2
>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>  Attachments: DFSck.java
>
> This is a utility to check health status of a DFS volume, and collect some 
> additional statistics.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to