[ 
http://issues.apache.org/jira/browse/HADOOP-101?page=comments#action_12371659 ] 

Doug Cutting commented on HADOOP-101:
-------------------------------------

I like that this does not use anything more than the client API to check the 
server.  That keeps the server core lean and mean.  The use of RPC's 
effectively restricts the impact of the scan on the FS.

A datanode operation that streams through a block without transferring it over 
the wire won't correctly check checksums using our existing mechanism.  To 
check file content we could instead simply implement a map-reduce job that 
streams through all the files in the fs.  This would not take much code: 
nothing additional in the core.  MapReduce should handle the locality, so that 
most data shouldn't go over the wire.

BTW, blocks not used by any file are not known to the name node, are they?  
When they're reported by a datanode the datanode is told to remove them.


> DFSck - fsck-like utility for checking DFS volumes
> --------------------------------------------------
>
>          Key: HADOOP-101
>          URL: http://issues.apache.org/jira/browse/HADOOP-101
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.2
>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>  Attachments: DFSck.java
>
> This is a utility to check health status of a DFS volume, and collect some 
> additional statistics.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to