[jira] [Comment Edited] (CASSANDRA-12269) Faster write path

T Jake Luciani (JIRA) Thu, 21 Jul 2016 14:15:33 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388437#comment-15388437
 ]


T Jake Luciani edited comment on CASSANDRA-12269 at 7/21/16 9:14 PM:
---------------------------------------------------------------------

[branch|https://github.com/tjake/cassandra/tree/write-perf]
[testall|http://cassci.datastax.com/job/tjake-write-perf-testall]
[dtests|http://cassci.datastax.com/job/tjake-write-perf-dtest/]

I wasn't able to close the gap on throughput but it's improved and latency now 
matches 2.2, see [cstar 
comparison|http://cstar.datastax.com/graph?command=one_job&stats=fea54640-4ebc-11e6-a5e8-0256e416528f]
  

CommitLogStress is also improved by ~15%



was (Author: tjake):

[branch|https://github.com/tjake/cassandra/tree/write-perf][testall|http://cassci.datastax.com/job/tjake-write-perf-testall][dtests|http://cassci.datastax.com/job/tjake-write-perf-dtest/]

I wasn't able to close the gap on throughput but it's improved and latency now 
matches 2.2, see [cstar 
comparison|http://cstar.datastax.com/graph?command=one_job&stats=fea54640-4ebc-11e6-a5e8-0256e416528f]
  

CommitLogStress is also improved by ~15%


> Faster write path
> -----------------
>
>                 Key: CASSANDRA-12269
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12269
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>             Fix For: 3.10
>
>
> The new storage engine (CASSANDRA-8099) has caused a regression in write 
> performance.  This ticket is to address these to try and bring 3.0 as close 
> to 2.2 as possible. There are four main reasons for this I've discovered 
> after much toil:
> 1.  The cost of calculating the size of a serialized row is higher now since 
> we no longer have the cell name and value managed as ByteBuffers as we did 
> pre-3.0.  That means we current re-serialize the row twice, once to calculate 
> the size and once to write the data.  This happens during the SSTable writes 
> and was addressed in CASSANDRA-9766.
>      Double serialization is also happening in CommitLog and the 
> MessagingService.  We need to apply the same techniques to these as we did to 
> the SSTable serialization.
> 2.  Even after fixing (1) there is still an issue with there being more GC 
> pressure and CPU usage in 3.0 due to the fact that we encode everything from 
> the {{Column}} to the {{Row}} to the {{Partition}} as a {{BTree}}.  
> Specifically, the {{BTreeSearchIterator}} is used for all iterator() methods. 
>  Both these classes are useful for efficient removal and searching of the 
> trees but in the case of SerDe we almost always want to simply walk the 
> entire tree forwards or reversed and apply a function to each element.  To 
> that end, we can use lambdas and do this without any extra classes.
> 3.  We use a lot of thread locals and check them constantly on the read/write 
> paths.  For client warnings, tracing, temp buffers, etc.  We should move all 
> thread locals to FastThreadLocals and threads to FastThreadLocalThreads.
> 4.  We changed the memtable flusher defaults in 3.2 that caused a regression 
> see: CASSANDRA-12228



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-12269) Faster write path

Reply via email to