[jira] [Comment Edited] (CASSANDRA-16360) CRC32 is inefficient on x86

Alexey Zotov (Jira) Tue, 13 Apr 2021 07:03:46 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314690#comment-17314690
 ]


Alexey Zotov edited comment on CASSANDRA-16360 at 4/13/21, 2:01 PM:
--------------------------------------------------------------------

I've given a try to two more pure Java-based implementations for _CRC32C_ 
calculation:
- [Common Codec 
library|https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/PureJavaCrc32C.html]
- Custom copy of CRC32C from Java9

Based on the benchmarking, I can see that they provide nearly the same results 
as Snappy's implementation (the results are consistent with other runs, run was 
made using Java 11.0.9):

{code:java}
[java] Benchmark                          (bufferSize)  Mode  Cnt     Score     
Error  Units
[java] ChecksumBench.benchComCdcPureJavaCrc32c      31  avgt    5    95.370 ±   
4.438  ns/op
[java] ChecksumBench.benchComCdcPureJavaCrc32c     131  avgt    5   212.066 ±   
5.835  ns/op
[java] ChecksumBench.benchComCdcPureJavaCrc32c     517  avgt    5   768.599 ±  
12.188  ns/op
[java] ChecksumBench.benchComCdcPureJavaCrc32c    2041  avgt    5  2884.305 ±  
86.143  ns/op
[java] ChecksumBench.benchJava9PortForCRC32C        31  avgt    5    89.839 ±   
2.027  ns/op
[java] ChecksumBench.benchJava9PortForCRC32C       131  avgt    5   233.196 ±  
22.965  ns/op
[java] ChecksumBench.benchJava9PortForCRC32C       517  avgt    5   738.161 ±  
17.959  ns/op
[java] ChecksumBench.benchJava9PortForCRC32C      2041  avgt    5  2561.263 ±  
13.935  ns/op
[java] ChecksumBench.benchSnappyPureJavaCrc32c      31  avgt    5    96.605 ±   
4.376  ns/op
[java] ChecksumBench.benchSnappyPureJavaCrc32c     131  avgt    5   239.617 ±  
11.468  ns/op
[java] ChecksumBench.benchSnappyPureJavaCrc32c     517  avgt    5   815.275 ±   
8.615  ns/op
[java] ChecksumBench.benchSnappyPureJavaCrc32c    2041  avgt    5  2960.709 ±  
68.225  ns/op
{code}

Looks like there is no easy way to improve pure Java implementations for 
_CRC32C_ calculation. And we need to decide whether we're ok to move forward 
with migration from _CRC32_ to _CRC32C_.

I'm waiting for some input.

----
I've run the same scenarios (obviously except of _benchCrc32c_ and 
_benchCrc32cNoIntrinsic_) on Java 1.8.0_281 and results are consistent with 
java 11.0.9.

----
The above changes are experimental. I'm not going to push them to the existing 
PR, however, they are available for review at: 
https://github.com/alex-ninja/cassandra/pull/1/files.


was (Author: azotcsit):
Based on the benchmarking, I could see that native _CRC32C_ implementation 
works really fast even without intrinsic. Here is one more test to highlight 
that (the results are consistent with other runs, run was made using Java 
11.0.9):
{code:java}
[java] Benchmark                          (bufferSize)  Mode  Cnt     Score     
Error  Units
[java] ChecksumBench.benchCrc32                     31  avgt    5   107.191 ±   
5.251  ns/op
[java] ChecksumBench.benchCrc32                    131  avgt    5    83.716 ±   
1.578  ns/op
[java] ChecksumBench.benchCrc32                    517  avgt    5   123.176 ±  
17.512  ns/op
[java] ChecksumBench.benchCrc32                   2041  avgt    5   273.591 ±   
9.123  ns/op
[java] ChecksumBench.benchCrc32cNoIntrinsic         31  avgt    5    52.850 ±   
3.461  ns/op
[java] ChecksumBench.benchCrc32cNoIntrinsic        131  avgt    5    73.552 ±   
1.624  ns/op
[java] ChecksumBench.benchCrc32cNoIntrinsic        517  avgt    5   196.009 ±   
9.141  ns/op
[java] ChecksumBench.benchCrc32cNoIntrinsic       2041  avgt    5   278.980 ±   
7.515  ns/op
[java] ChecksumBench.benchPureJavaCrc32c            31  avgt    5    98.419 ±   
5.221  ns/op
[java] ChecksumBench.benchPureJavaCrc32c           131  avgt    5   239.515 ±   
5.118  ns/op
[java] ChecksumBench.benchPureJavaCrc32c           517  avgt    5   828.281 ± 
107.874  ns/op
[java] ChecksumBench.benchPureJavaCrc32c          2041  avgt    5  2941.934 ±  
55.716  ns/op
{code}
 I've checked the implementation and looks like the reason of such a great 
performance of native _CRC32C_ implementation is that it heavily relies on 
_Unsafe_ operations. Initially I though we can easily implement a custom 
_CRC32C_ similar to the native one, however, now I do not think it is easy 
enough and I have two concerns:
 # there will be a need to use some libraries that wrap up work with Unsafe
 # I'm not sure that from licensing perspective we are permitted to "re-work" 
(copy-paste and adapt) the code from CRC32C

So I'm waiting for some input before moving forward in any direction.

> CRC32 is inefficient on x86
> ---------------------------
>
>                 Key: CASSANDRA-16360
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16360
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Avi Kivity
>            Priority: Normal
>              Labels: protocolv5
>             Fix For: 4.0.x
>
>
> The client/server protocol specifies CRC24 and CRC32 as the checksum 
> algorithm (cql_protocol_V5_framing.asc). Those however are expensive to 
> compute; this affects both the client and the server.
>  
> A better checksum algorithm is CRC32C, which has hardware support on x86 (as 
> well as other modern architectures).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-16360) CRC32 is inefficient on x86

Reply via email to