[
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773168#action_12773168
]
Todd Lipcon commented on HADOOP-3205:
-------------------------------------
bq. I don't see why we can't use pure java CRC32 from HADOOP-6148
We already do use this - the microbenchmark above (reading checksummed files
from /dev/shm) shows that CRC is the majority of the CPU overhead in
FSInputChecker and that array copying makes up very little of the time.
bq. When the user gives large buffer, there is no need to copy to intermediate
buffer
I see... so I guess what you're saying is that we should do away with the
internal BufferedInputStream in DFSClient.BlockReader, and then occasionally
insert a buffer only in the case when the user-provided buffer is small? This
seems like a fair amount of confusing complexity due to the buffer management
involved.
Do we have some kind of benchmark that indicates that these copies make up any
appreciable overhead compared to the fairly slow checksumming?
> FSInputChecker and FSOutputSummer should allow better access to user buffer
> ---------------------------------------------------------------------------
>
> Key: HADOOP-3205
> URL: https://issues.apache.org/jira/browse/HADOOP-3205
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have
> access to full user buffer. At any time DFS can access only up to 512 bytes
> even though user usually reads with a much larger buffer (often controlled by
> io.file.buffer.size). This requires implementations to double buffer data if
> an implementation wants to read or write larger chunks of data from
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two
> separate jiras.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.