[
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773132#action_12773132
]
Raghu Angadi commented on HADOOP-3205:
--------------------------------------
bq. This was originally rejected in HADOOP-6148 due to the complexity of
maintaining two different CRC32
This jira is not about CRC32 cost, but I don't see why we can't use pure java
CRC32 from HADOOP-6148. It is already used in DataNode. CRC32 implementation is
transparent to FSInputChecker. If it is good for multiple other places in
Hadoop, it is good for FileSystem as well.
bq. Are you suggesting here that we could do away with the internal buffer and
assume that users are always going to do large reads? Doesn't that violate the
contract of fs.open taking a buffer size?
essentially, yes. When the user gives large buffer, there is no need to copy to
intermediate buffer. We would not require or assume the user gives a large
buffer but the common case is that user does. DFSClient would read fixed length
packet header from the underlying socket and then read the data directly to
user buffer if the size is comparable or larger than the packet (64k).
I don't see how any this would violate the contract. fs.open buffer size is
only a hint.. underlying FS should know what is more optimal.
> FSInputChecker and FSOutputSummer should allow better access to user buffer
> ---------------------------------------------------------------------------
>
> Key: HADOOP-3205
> URL: https://issues.apache.org/jira/browse/HADOOP-3205
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have
> access to full user buffer. At any time DFS can access only up to 512 bytes
> even though user usually reads with a much larger buffer (often controlled by
> io.file.buffer.size). This requires implementations to double buffer data if
> an implementation wants to read or write larger chunks of data from
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two
> separate jiras.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.