[
https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Thomas updated HDFS-6560:
-------------------------------
Attachment: HDFS-3528.patch
Ran some basic performance tests on a 10^8 byte data array. All listed times
are for a single call to verifyChunkedSums. Average over 20 runs.
Direct buffer with existing native implementation for direct buffers:
-Time for CRC32: 56.5 ms
-Time for CRC32C: 7.3 ms
Direct buffer with Java implementation:
-Time for CRC32: 81.8 ms
-Time for CRC32C: 82.5 ms
Byte array with native implementation developed in this patch:
-Time for CRC32: 55.0 ms
-Time for CRC32C: 7.63 ms
Byte array with Java implementation:
-Time for CRC32: 74.4 ms
-Time for CRC32C: 74.7 ms
So it seems like the native byte array implementation is essentially as fast as
the direct buffer equivalent.
Next, I ran a test on a single-node cluster (DN had 10 spinning disks) where I
wrote a 1 GB file (128 MB block size, all other cluster defaults in place).
Averages over 20 runs:
Without change: 128.3 MB/s
With change: 128.4 MB/s
The difference here is not significant. This matches up with Trevor Robinson's
results from HDFS-3529 (he refactored write-side code to use direct buffers so
that the direct buffer-based native implementation could be used). He saw a
significant performance improvement in a setup with SSD drives, so I assume I
would see a similar improvement here as well. Once there is some discussion on
HDFS-6561, I can try to implement client-side native checksumming and see if
that changes things.
> Byte array native checksumming on DN side
> -----------------------------------------
>
> Key: HDFS-6560
> URL: https://issues.apache.org/jira/browse/HDFS-6560
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, hdfs-client, performance
> Reporter: James Thomas
> Assignee: James Thomas
> Attachments: HDFS-3528.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)