[jira] Commented: (HADOOP-3981) Need a distributed file checksum algorithm for HDFS

Raghu Angadi (JIRA) Wed, 10 Sep 2008 10:56:35 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629886#action_12629886
 ]


Raghu Angadi commented on HADOOP-3981:
--------------------------------------

> Why do you use the datanode's socket/opcode interface rather than adding a 
> method to ClientDatanodeProtocol?

Nicholas had briefly talk to me regd this. I was ok with either way. If RPCs 
are used, then other RPCs on the port should be prepared to handle delays on 
the order of minutes, since these checksum RPCs compete with the rest of the 
disk accesses. And there could be quite a few these requests.

Datanode has just 3 RPC handlers.. we probably should not increase the handlers 
for this reason since checksum load would be very rare and DataNode is thread 
starved already.


> Need a distributed file checksum algorithm for HDFS
> ---------------------------------------------------
>
>                 Key: HADOOP-3981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3981
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>         Attachments: 3981_20080909.patch
>
>
> Traditional message digest algorithms, like MD5, SHA1, etc., require reading 
> the entire input message sequentially in a central location.  HDFS supports 
> large files with multiple tera bytes.  The overhead of reading the entire 
> file is huge. A distributed file checksum algorithm is needed for HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3981) Need a distributed file checksum algorithm for HDFS

Reply via email to