[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861604#comment-16861604
 ] 

Joseph Lynch commented on CASSANDRA-15066:
------------------------------------------

[~vinaykumarcse] and I have been updating the patch authors and reviewers 
continually on IRC and now ASF slack as we are running tests, but since this is 
getting close to merge I just want to chime in that our small scale (12 node) 
testing is showing excellent results so far. We've been working to validate 
this patch from a real-world-deployment/scalability/performance perspective and 
at this time the patchset appears more stable and more performant than 3.0. The 
testing methodology and results are being recording in an open 
[spreadsheet|https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=0]
 that we are updating as we test and once this is merged we can start resuming 
our formal tests as part of CASSANDRA-14746.

A summary of the results so far from our (Netflix) testing:
 * The unbounded hints that we used to see under load on trunk are no longer 
there.
 * Better process level stability (thread CPU distribution, JVM allocation, etc 
...)
 * Excellent CPU flamegraphs and profiles, messaging is almost never the 
dominant CPU factor.
 * Performance appears superior in this patch to 3.0.x across the board

So far we have very thoroughly tested the following on the small cluster:
 * LOCAL_ONE with variable read <-> write balance of 4kb multi column partitions
 * LOCAL_QUORUM with variable read <-> write balance of 4kb multi-column 
partitions
 * QUORUM with variable read <-> write balance of 4kb multi-column partitions

We have also begun verification of the following combinations of messaging:
 * Compression on, Encryption on
 * Compression on, Encryption off
 * Cross datacenter setups with 80ms of delay

> Improvements to Internode Messaging
> -----------------------------------
>
>                 Key: CASSANDRA-15066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Internode
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: High
>             Fix For: 4.0
>
>         Attachments: 20k_backfill.png, 60k_RPS.png, 
> 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, 
> baseline_patch_vs_30x.png, increasing_reads_latency.png, 
> many_reads_cass_perf_ft_msg_tst.svg
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to