[jira] [Updated] (HDFS-6560) Byte array native checksumming on DN side

James Thomas (JIRA) Fri, 20 Jun 2014 10:54:11 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Thomas updated HDFS-6560:
-------------------------------

    Attachment: HDFS-3528.patch

Ran some basic performance tests on a 10^8 byte data array. All listed times 
are for a single call to verifyChunkedSums. Average over 20 runs.

Direct buffer with existing native implementation for direct buffers:
-Time for CRC32:  56.5 ms
-Time for CRC32C: 7.3 ms

Direct buffer with Java implementation:
-Time for CRC32: 81.8 ms
-Time for CRC32C: 82.5 ms

Byte array with native implementation developed in this patch:
-Time for CRC32: 55.0 ms
-Time for CRC32C: 7.63 ms

Byte array with Java implementation:
-Time for CRC32: 74.4 ms
-Time for CRC32C: 74.7 ms

So it seems like the native byte array implementation is essentially as fast as 
the direct buffer equivalent.

Next, I ran a test on a single-node cluster (DN had 10 spinning disks) where I 
wrote a 1 GB file (128 MB block size, all other cluster defaults in place). 
Averages over 20 runs:

Without change: 128.3 MB/s
With change: 128.4 MB/s

The difference here is not significant. This matches up with Trevor Robinson's 
results from HDFS-3529 (he refactored write-side code to use direct buffers so 
that the direct buffer-based native implementation could be used). He saw a 
significant performance improvement in a setup with SSD drives, so I assume I 
would see a similar improvement here as well. Once there is some discussion on 
HDFS-6561, I can try to implement client-side native checksumming and see if 
that changes things.

> Byte array native checksumming on DN side
> -----------------------------------------
>
>                 Key: HDFS-6560
>                 URL: https://issues.apache.org/jira/browse/HDFS-6560
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, hdfs-client, performance
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-3528.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6560) Byte array native checksumming on DN side

Reply via email to