[ 
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773132#action_12773132
 ] 

Raghu Angadi commented on HADOOP-3205:
--------------------------------------

bq. This was originally rejected in HADOOP-6148 due to the complexity of 
maintaining two different CRC32 

This jira is not about CRC32 cost, but I don't see why we can't use pure java 
CRC32 from HADOOP-6148. It is already used in DataNode. CRC32 implementation is 
transparent to FSInputChecker. If it is good for multiple other places in 
Hadoop, it is good for FileSystem as well.

bq. Are you suggesting here that we could do away with the internal buffer and 
assume that users are always going to do large reads? Doesn't that violate the 
contract of fs.open taking a buffer size?

essentially, yes. When the user gives large buffer, there is no need to copy to 
intermediate buffer. We would not require or assume the user gives a large 
buffer but the common case is that user does. DFSClient would read fixed length 
packet header from the underlying socket and then read the data directly to 
user buffer if the size is comparable or larger than the packet (64k).

I don't see how any this would violate the contract. fs.open buffer size is 
only a hint.. underlying FS should know what is more optimal.

> FSInputChecker and FSOutputSummer should allow better access to user buffer
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-3205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3205
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have 
> access to full user buffer. At any time DFS can access only up to 512 bytes 
> even though user usually reads with a much larger buffer (often controlled by 
> io.file.buffer.size). This requires implementations to double buffer data if 
> an implementation wants to read or write larger chunks of data from 
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two 
> separate jiras.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to