[ 
https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386859#comment-14386859
 ] 

Edward Nevill commented on HADOOP-11660:
----------------------------------------

Hi,

I have revised the patch to include the changes requested above. I have also 
updated test_bulk_crc32.c so it prints out the times for 16384 bytes @ 512 
bytes per checksum X 1000000 iterations for both the Castagnoli and Zlib 
polynomials.

The following are the results I get for x86_64 before and after. I have done 5 
runs of each.

BEFORE

{code}
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.8
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.84
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.1
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.85
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.94
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.81
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
{code}

AFTER

{code}
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.11
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.92
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.99
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.11
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.9
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.92
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
[ed@mylittlepony hadoop]$ 
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 1.12
CRC 16384 bytes @ 512 bytes per checksum X 1000000 iterations = 13.92
./hadoop-common-project/hadoop-common/target/native/test_bulk_crc32: SUCCESS.
{code}

Loking at the average time over 5 runs gives

BEFORE

Castagnoli average = 1.116 sec
Zlib average = 13.848 sec

AFTER

Castagnoli average = 1.116
Zlib average = 13.93

So the performance for the Castagnoli polynomial is the same. For the Zlib 
poynomial there seems to be a performance degradation of 0.6%. This may be due 
to experimental error, however this is unaccelerated in any case on x86 because 
it is not supported on x86 HW and is not used for HDFS.

For comparison, on aarch64 partner HW I get the following averages

Castagnoli = 3.586
Zlib = 3.580

Many thanks for you help with this,
Ed.


> Add support for hardware crc on ARM aarch64 architecture
> --------------------------------------------------------
>
>                 Key: HADOOP-11660
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11660
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native
>    Affects Versions: 3.0.0
>         Environment: ARM aarch64 development platform
>            Reporter: Edward Nevill
>            Assignee: Edward Nevill
>            Priority: Minor
>              Labels: performance
>         Attachments: jira-11660.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This patch adds support for hardware crc for ARM's new 64 bit architecture
> The patch is completely conditionalized on __aarch64__
> I have only added support for the non pipelined version as I benchmarked the 
> pipelined version on aarch64 and it showed no performance improvement.
> The aarch64 version supports both Castagnoli and Zlib CRCs as both of these 
> are supported on ARM aarch64 hardwre.
> To benchmark this I modified the test_bulk_crc32 test to print out the time 
> taken to CRC a 1MB dataset 1000 times.
> Before:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> After:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to