[ 
https://issues.apache.org/jira/browse/CASSANDRA-13896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prince Nana Owusu Boateng updated CASSANDRA-13896:
--------------------------------------------------
     Attachment: Screen Shot 2017-09-22 at 3.31.09 PM.png
    Description: 
During our Cassandra performance testing, we see high percentage of the CPU 
spent in *org.apache.cassandra.utils.memory.SlabAllocator.allocate(int, OpOrder 
Group) * method.  Appears to be high contention of the *<nextFreeOffset>* 
atomic Integer in write workloads.   This structure is used by the threads for 
keeping track of the region bytebuffer allocation.  When the contention 
appears, adding more clients, modifications of write specific parameters does 
not change write throughput performance.  Attached are the details of Java 
Flight Recorder (JFR), showing hot functions and also performance results.   
When we see this contention, we still have plenty of CPU and throughput left ( 
*<20%*  Total average CPU utilization and  *<11%* of the storage write total 
throughput).   This occurs on Cassandra 3.10.0-src version using the 
Cassandra-Stress.

Proposal:
We will like to introduce a solution which eliminates the atomic operations on 
the *<nextFreeOffset>* atomic Integer. This implementation will allow 
concurrent allocation of bytebuffers without an atomic compareAndSet and 
incrementAndGet operations. The solution is expected to increase overall write 
performance while improving CPU utilization.

  was:
During our Cassandra performance testing, we see high percentage of the CPU 
spent in *org.apache.cassandra.utils.memory.SlabAllocator.allocate(int, OpOrder 
Group) * method.  Appears to be high contention of the *<nextFreeOffset>* 
atomic Integer in write workloads.   This structure is used by the threads for 
keeping track of the region bytebuffer allocation.  When the contention 
appears, adding more clients, modifications of write specific parameters does 
not change write throughput performance.  Attached are the details of Java 
Flight Recorder (JFR), showing hot functions.   When we see this contention, we 
still have plenty of CPU and throughput left ( *<20%*  Total average CPU 
utilization and * <11%* of the storage write total throughput).   This occurs 
on Cassandra 3.10.0-src version using the Cassandra-Stress.

Proposal:
We will like to introduce a solution which eliminates the atomic operations on 
the *<nextFreeOffset>* atomic Integer. This implementation will allow 
concurrent allocation of bytebuffers without an atomic compareAndSet and 
incrementAndGet operations. The solution is expected to increase overall write 
performance while improving CPU utilization.


I'm working on having a patch for this soon.

*+Throughput:+*
Results:
Op rate                   :   18,233 op/s  [insert: 18,233 op/s]
Partition rate            :   18,233 pk/s  [insert: 18,233 pk/s]
Row rate                  :  182,364 row/s [insert: 182,364 row/s]
Latency mean              :    6.8 ms [insert: 6.8 ms]
Latency median            :    4.4 ms [insert: 4.4 ms]
Latency 95th percentile   :   19.4 ms [insert: 19.4 ms]
Latency 99th percentile   :   25.9 ms [insert: 25.9 ms]
Latency 99.9th percentile :   85.3 ms [insert: 85.3 ms]
Latency max               :  376.2 ms [insert: 376.2 ms]
Total partitions          : 32,823,392 [insert: 32,823,392]
Total errors              :          0 [insert: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:30:00

> Improving Cassandra write performance  
> ---------------------------------------
>
>                 Key: CASSANDRA-13896
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13896
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>         Environment: Skylake server with 2 sockets, 192GB RAM, 3xPCIe SSDs
> OS: Centos 7.3 
> Java: Oracle JDK1.8.0_121
>            Reporter: Prince Nana Owusu Boateng
>              Labels: Performance
>             Fix For: 4.x
>
>         Attachments: Screen Shot 2017-09-22 at 11.22.43 AM.png, Screen Shot 
> 2017-09-22 at 3.31.09 PM.png
>
>
> During our Cassandra performance testing, we see high percentage of the CPU 
> spent in *org.apache.cassandra.utils.memory.SlabAllocator.allocate(int, 
> OpOrder Group) * method.  Appears to be high contention of the 
> *<nextFreeOffset>* atomic Integer in write workloads.   This structure is 
> used by the threads for keeping track of the region bytebuffer allocation.  
> When the contention appears, adding more clients, modifications of write 
> specific parameters does not change write throughput performance.  Attached 
> are the details of Java Flight Recorder (JFR), showing hot functions and also 
> performance results.   When we see this contention, we still have plenty of 
> CPU and throughput left ( *<20%*  Total average CPU utilization and  *<11%* 
> of the storage write total throughput).   This occurs on Cassandra 3.10.0-src 
> version using the Cassandra-Stress.
> Proposal:
> We will like to introduce a solution which eliminates the atomic operations 
> on the *<nextFreeOffset>* atomic Integer. This implementation will allow 
> concurrent allocation of bytebuffers without an atomic compareAndSet and 
> incrementAndGet operations. The solution is expected to increase overall 
> write performance while improving CPU utilization.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to