[jira] Commented: (HADOOP-3205) FSInputChecker and FSOutputSummer should allow better access to user buffer

Todd Lipcon (JIRA) Tue, 03 Nov 2009 19:28:36 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773168#action_12773168
 ]


Todd Lipcon commented on HADOOP-3205:
-------------------------------------

bq. I don't see why we can't use pure java CRC32 from HADOOP-6148

We already do use this - the microbenchmark above (reading checksummed files 
from /dev/shm) shows that CRC is the majority of the CPU overhead in 
FSInputChecker and that array copying makes up very little of the time.

bq. When the user gives large buffer, there is no need to copy to intermediate 
buffer

I see... so I guess what you're saying is that we should do away with the 
internal BufferedInputStream in DFSClient.BlockReader, and then occasionally 
insert a buffer only in the case when the user-provided buffer is small? This 
seems like a fair amount of confusing complexity due to the buffer management 
involved.

Do we have some kind of benchmark that indicates that these copies make up any 
appreciable overhead compared to the fairly slow checksumming?



> FSInputChecker and FSOutputSummer should allow better access to user buffer
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-3205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3205
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have 
> access to full user buffer. At any time DFS can access only up to 512 bytes 
> even though user usually reads with a much larger buffer (often controlled by 
> io.file.buffer.size). This requires implementations to double buffer data if 
> an implementation wants to read or write larger chunks of data from 
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two 
> separate jiras.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3205) FSInputChecker and FSOutputSummer should allow better access to user buffer

Reply via email to