[ 
https://issues.apache.org/jira/browse/CASSANDRA-16360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335042#comment-17335042
 ] 

Alexey Zotov commented on CASSANDRA-16360:
------------------------------------------

Thanks for the feedback!

[~benedict]
 # I'm glad to hear that is smth useful. Though I'm not sure how to get the 
existing PR merged. Please, let me know if I need to take any further actions 
(e.g. update {{CHANGES.txt}}), otherwise I'll be just waiting.
 # Ok, sounds like a good suggestion. I'll add it to the list of further action 
items.
 # That's what I thought too. Then I'll just update the fix version to 4.x and 
put the work on hold for now.

{quote}That said, it looks like there might be some measurement error in your 
results, and the crc32 intrinsic version may be benefitting from tight loop 
benchmarking (so high cache occupancy). It certainly doesn't look like a slam 
dunk for urgent migration anyway.{quote}
That's an interesting point. My feeling was that all the implementations 
equally benefit from the cache occupancy. Therefore, it should not affect the 
final results significantly (obviously it should affects the total numbers, but 
not the relative numbers). At the end of the day, the origin of the 
optimization is more about the number of CPU cycles required to calculate the 
checksum which does not seem to be directly related to the cache occupancy.

I guess you expected to see a bigger improvement for _CRC32C_ (compared to 
_CRC32_) and that's what causes the suspicion. I'm just curious, why do you 
think there should be a bigger improvement and what kind of difference did you 
expect?

In the meantime, I'll try to start generating a large byte array and calculate 
hashes for slices (sub-arrays). By doing that, we'll ensure that data for hash 
calculation is not cached because every time we calculate the hash for a new 
slice. I'll raise one more experimental PR, so we can see the results and 
re-visit the conclusion if needed.

[~samt]
I was going to start asking about clients compatibility a bit later, but since 
you brought this point let me raise a couple of questions:
# Obviously there are multiple clients (java, python, etc). How should it be 
approached? Do I just need to submit tickets to the corresponding projects? 
Where can I find the whole list of clients? I just want to understand the high 
level steps (no need to provide a detailed step-by-step instruction, it is 
simply too early for that).
# Am I right that {{InboundConnectionInitiator / OutboundConnectionInitiator / 
HandshakeProtocol}} are about internode communication, whereas 
{{PipelineConfigurator / InitialConnectionHandler}} are about client 
communication? Am I missing smth here?

{quote}If you implement step 7 so that the encoders/decoders support both CRC32 
and CRC32C then this becomes two separate (possibly parallelisable) pieces of 
work.{quote}
You suggestion makes perfect sense to me! Basically I can proceed with this 
change only (it would be just a small refactoring/re-design without changing 
the existing behavior). As a result of the change, {{ChecksumType.CRC32}} will 
be used in decoders/encoders (as stated in step 7). By doing that, we will need 
much less changes in the future when we actually start adding _CRC32C_ support. 
Do you think it is worth to make this part now or we should wait and make the 
full change later on?
 
PS:
Even though we're going to complete this ticket now and have to wait until Java 
8 support in C* is abandoned, I'm still encouraged to get it done. If by some 
reason I miss the moment when the full blown work can be started, please, ping 
me :)

> CRC32 is inefficient on x86
> ---------------------------
>
>                 Key: CASSANDRA-16360
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16360
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Avi Kivity
>            Assignee: Alexey Zotov
>            Priority: Normal
>              Labels: protocolv6
>             Fix For: 4.0.x
>
>
> The client/server protocol specifies CRC24 and CRC32 as the checksum 
> algorithm (cql_protocol_V5_framing.asc). Those however are expensive to 
> compute; this affects both the client and the server.
>  
> A better checksum algorithm is CRC32C, which has hardware support on x86 (as 
> well as other modern architectures).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to