[jira] [Commented] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid

Josh McKenzie (Jira) Thu, 01 Oct 2020 18:58:45 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205922#comment-17205922
 ]


Josh McKenzie commented on CASSANDRA-14746:
-------------------------------------------

{quote}4.0 should have better latency, more throughput, fewer threads, fewer 
context switches, less GC allocation, and faster recovery time. 
{quote}
Was this the goal of the MS rewrite? I have no horse in this race - I just 
thought the goal of it was to tighten up some of the things that were present / 
still troublesome after Jason's rewrite of things rather than specifically 
targeting performance improvements.

I'd personally advocate for "no regression on categories a-e" with better 
backpressure, tolerance for failure, etc. etc. that I understood to come along 
w/the MS rewrite. At least in terms of what we should consider a blocker for 
4.0, I think "don't regress" is a stance that makes sense, especially as 
incremental performance improvements are reasonable to consider for patch 
releases IMO.

And fwiw, the benchmarks I've seen on 4.0 show a pretty significant improvement 
in throughput if nothing else, but in terms of bar - no regression for a 
rewrite seems like a good low water mark to block on.

 

> Ensure Netty Internode Messaging Refactor is Solid
> --------------------------------------------------
>
>                 Key: CASSANDRA-14746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14746
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Streaming and Messaging
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Normal
>              Labels: 4.0-QA
>             Fix For: 4.0-beta, 4.0-triage
>
>
> Before we release 4.0 let's ensure that the internode messaging refactor is 
> 100% solid. As internode messaging is naturally used in many code paths and 
> widely configurable we have a large number of cluster configurations and test 
> configurations that must be vetted.
> We plan to vary the following:
>  * Version of Cassandra 3.0.17 vs 4.0-alpha
>  * Cluster sizes with *multi-dc* deployments ranging from 6 - 100 nodes
>  * Client request rates varying between 1k QPS and 100k QPS of varying sizes 
> and shapes (BATCH, INSERT, SELECT point, SELECT range, etc ...)
>  * Internode compression
>  * Internode SSL (as well as openssl vs jdk)
>  * Internode Coalescing options
> We are looking to measure the following as appropriate:
>  * Latency distributions of reads and writes (lower is better)
>  * Scaling limit, aka maximum throughput before violating p99 latency 
> deadline of 10ms @ LOCAL_QUORUM, on a fixed hardware deployment for 100% 
> writes, 100% reads and 50-50 writes+reads (higher is better)
>  * Thread counts (lower is better)
>  * Context switches (lower is better)
>  * On-CPU time of tasks (higher periods without context switch is better)
>  * GC allocation rates / throughput for a fixed size heap (lower allocation 
> better)
>  * Streaming recovery time for a single node failure, i.e. can Cassandra 
> saturate the NIC
>  
> The goal is that 4.0 should have better latency, more throughput, fewer 
> threads, fewer context switches, less GC allocation, and faster recovery 
> time. I'm putting Jason Brown as the reviewer since he implemented most of 
> the internode refactor.
> Current collaborators driving this QA task: Dinesh Joshi, Jordan West, Joey 
> Lynch (Netflix), Vinay Chella (Netflix)
> Owning committer(s): Jason Brown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid

Reply via email to