[ 
https://issues.apache.org/jira/browse/HDFS-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492024#comment-13492024
 ] 

Trevor Robinson commented on HDFS-3529:
---------------------------------------

My patch should enable native CRC on the datanode side, so it sounds like we 
just need to enable it on the client side too. This patch seemed large enough 
already that I wanted to get it reviewed and committed before embarking on the 
next step.

FYI, I swapped 4 HDDs for 4 SSDs in the E3-1240 system I mentioned earlier. On 
trunk, I get a TestDFSIO write throughput of 257 MiB/s (96 1.5 GiB files, 4 
mappers, 1 node); with the patch, I get 326 MiB/s (a speedup of 27%). I don't 
think people generally run Hadoop on SSD, but just wanted to point out that 
this patch alone can be significant in some configurations.
                
> Use direct buffers for data in write path
> -----------------------------------------
>
>                 Key: HDFS-3529
>                 URL: https://issues.apache.org/jira/browse/HDFS-3529
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, performance
>    Affects Versions: 2.0.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Trevor Robinson
>         Attachments: dfsio-x86-trunk-vs-3529.png, HDFS-3529.patch
>
>
> The write path currently makes several unnecessary data copies in order to go 
> to and from byte arrays. We can improve performance by using direct byte 
> buffers to avoid the copy. This is also a prerequisite for native checksum 
> calculation (HDFS-3528)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to