[
https://issues.apache.org/jira/browse/CASSANDRA-16360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314690#comment-17314690
]
Alexey Zotov edited comment on CASSANDRA-16360 at 4/13/21, 2:01 PM:
--------------------------------------------------------------------
I've given a try to two more pure Java-based implementations for _CRC32C_
calculation:
- [Common Codec
library|https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/PureJavaCrc32C.html]
- Custom copy of CRC32C from Java9
Based on the benchmarking, I can see that they provide nearly the same results
as Snappy's implementation (the results are consistent with other runs, run was
made using Java 11.0.9):
{code:java}
[java] Benchmark (bufferSize) Mode Cnt Score
Error Units
[java] ChecksumBench.benchComCdcPureJavaCrc32c 31 avgt 5 95.370 ±
4.438 ns/op
[java] ChecksumBench.benchComCdcPureJavaCrc32c 131 avgt 5 212.066 ±
5.835 ns/op
[java] ChecksumBench.benchComCdcPureJavaCrc32c 517 avgt 5 768.599 ±
12.188 ns/op
[java] ChecksumBench.benchComCdcPureJavaCrc32c 2041 avgt 5 2884.305 ±
86.143 ns/op
[java] ChecksumBench.benchJava9PortForCRC32C 31 avgt 5 89.839 ±
2.027 ns/op
[java] ChecksumBench.benchJava9PortForCRC32C 131 avgt 5 233.196 ±
22.965 ns/op
[java] ChecksumBench.benchJava9PortForCRC32C 517 avgt 5 738.161 ±
17.959 ns/op
[java] ChecksumBench.benchJava9PortForCRC32C 2041 avgt 5 2561.263 ±
13.935 ns/op
[java] ChecksumBench.benchSnappyPureJavaCrc32c 31 avgt 5 96.605 ±
4.376 ns/op
[java] ChecksumBench.benchSnappyPureJavaCrc32c 131 avgt 5 239.617 ±
11.468 ns/op
[java] ChecksumBench.benchSnappyPureJavaCrc32c 517 avgt 5 815.275 ±
8.615 ns/op
[java] ChecksumBench.benchSnappyPureJavaCrc32c 2041 avgt 5 2960.709 ±
68.225 ns/op
{code}
Looks like there is no easy way to improve pure Java implementations for
_CRC32C_ calculation. And we need to decide whether we're ok to move forward
with migration from _CRC32_ to _CRC32C_.
I'm waiting for some input.
----
I've run the same scenarios (obviously except of _benchCrc32c_ and
_benchCrc32cNoIntrinsic_) on Java 1.8.0_281 and results are consistent with
java 11.0.9.
----
The above changes are experimental. I'm not going to push them to the existing
PR, however, they are available for review at:
https://github.com/alex-ninja/cassandra/pull/1/files.
was (Author: azotcsit):
Based on the benchmarking, I could see that native _CRC32C_ implementation
works really fast even without intrinsic. Here is one more test to highlight
that (the results are consistent with other runs, run was made using Java
11.0.9):
{code:java}
[java] Benchmark (bufferSize) Mode Cnt Score
Error Units
[java] ChecksumBench.benchCrc32 31 avgt 5 107.191 ±
5.251 ns/op
[java] ChecksumBench.benchCrc32 131 avgt 5 83.716 ±
1.578 ns/op
[java] ChecksumBench.benchCrc32 517 avgt 5 123.176 ±
17.512 ns/op
[java] ChecksumBench.benchCrc32 2041 avgt 5 273.591 ±
9.123 ns/op
[java] ChecksumBench.benchCrc32cNoIntrinsic 31 avgt 5 52.850 ±
3.461 ns/op
[java] ChecksumBench.benchCrc32cNoIntrinsic 131 avgt 5 73.552 ±
1.624 ns/op
[java] ChecksumBench.benchCrc32cNoIntrinsic 517 avgt 5 196.009 ±
9.141 ns/op
[java] ChecksumBench.benchCrc32cNoIntrinsic 2041 avgt 5 278.980 ±
7.515 ns/op
[java] ChecksumBench.benchPureJavaCrc32c 31 avgt 5 98.419 ±
5.221 ns/op
[java] ChecksumBench.benchPureJavaCrc32c 131 avgt 5 239.515 ±
5.118 ns/op
[java] ChecksumBench.benchPureJavaCrc32c 517 avgt 5 828.281 ±
107.874 ns/op
[java] ChecksumBench.benchPureJavaCrc32c 2041 avgt 5 2941.934 ±
55.716 ns/op
{code}
I've checked the implementation and looks like the reason of such a great
performance of native _CRC32C_ implementation is that it heavily relies on
_Unsafe_ operations. Initially I though we can easily implement a custom
_CRC32C_ similar to the native one, however, now I do not think it is easy
enough and I have two concerns:
# there will be a need to use some libraries that wrap up work with Unsafe
# I'm not sure that from licensing perspective we are permitted to "re-work"
(copy-paste and adapt) the code from CRC32C
So I'm waiting for some input before moving forward in any direction.
> CRC32 is inefficient on x86
> ---------------------------
>
> Key: CASSANDRA-16360
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16360
> Project: Cassandra
> Issue Type: Improvement
> Components: Messaging/Client
> Reporter: Avi Kivity
> Priority: Normal
> Labels: protocolv5
> Fix For: 4.0.x
>
>
> The client/server protocol specifies CRC24 and CRC32 as the checksum
> algorithm (cql_protocol_V5_framing.asc). Those however are expensive to
> compute; this affects both the client and the server.
>
> A better checksum algorithm is CRC32C, which has hardware support on x86 (as
> well as other modern architectures).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]