T Jake Luciani created CASSANDRA-12269:
------------------------------------------
Summary: Faster write path
Key: CASSANDRA-12269
URL: https://issues.apache.org/jira/browse/CASSANDRA-12269
Project: Cassandra
Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Fix For: 3.10
The new storage engine (CASSANDRA-8099) has caused a regression in write and
read performance. This ticket is to address these to try and bring 3.0 as
close to 2.2 as possible. There are four main reasons for this I've discovered
after much toil:
1. The cost of calculating the size of a serialized row is higher now since we
no longer have the cell name and value managed as ByteBuffers as we did
pre-3.0. That means we current re-serialize the row twice, once to calculate
the size and once to write the data. This happens during the SSTable writes
and was addressed in CASSANDRA-9766.
Double serialization is also happening in CommitLog and the
MessagingService. We need to apply the same techniques to these as we did to
the SSTable serialization.
2. Even after fixing (1) there is still an issue with there being more GC
pressure and CPU usage in 3.0 due to the fact that we encode everything from
the {{Column}} to the {{Row}} to the {{Partition}} as a {{BTree}}.
Specifically, the {{BTreeSearchIterator}} is used for all iterator() methods.
Both these classes are useful for efficient removal and searching of the trees
but in the case of SerDe we almost always want to simply walk the entire tree
forwards or reversed and apply a function to each element. To that end, we can
use lambdas and do this without any extra classes.
3. We use a lot of thread locals and check them constantly on the read/write
paths. For client warnings, tracing, temp buffers, etc. We should move all
thread locals to FastThreadLocals and threads to FastThreadLocalThreads.
4. We changed the memtable flusher defaults in 3.2 that caused a regression
see: CASSANDRA-12228
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)