[
https://issues.apache.org/jira/browse/CASSANDRA-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389520#comment-15389520
]
Sylvain Lebresne commented on CASSANDRA-12269:
----------------------------------------------
I went very quickly over the patch out of curiosity, and have a few remarks
(not at all a thorough review, mostly nitpicks):
* We don't need {{clusteringTypes}} in {{CFMetadata}}. Not sure why
{{SerializationHeader}} has that {{typesOf}} method, but we can simply get the
list of types through {{metadata.comparator.subtypes()}}.
* *Really* dislike making the fields public in {{Columns}} and {{BTreeRow}}.
It's both leaking abstraction and is imo dangerous. From a quick glance, it
seems we can easily avoid both by simply adding a {{apply(Function f)}} method
to both {{Row}} and {{Columns}}.
* The reuse of {{SerializationHeader}} feels pretty ugly to me. Are we really
gaining much by saving the allocation of that one object (it's not like it's
allocated for every cell we read)?
* Using arrays for wrapping single ints for lambdas also doesn't feel too
elegant to me. Would prefer adding a very simple IntWrapper class with some
{{add()}}/{{get()}} methods in utils.
> Faster write path
> -----------------
>
> Key: CASSANDRA-12269
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12269
> Project: Cassandra
> Issue Type: Improvement
> Reporter: T Jake Luciani
> Assignee: T Jake Luciani
> Fix For: 3.10
>
>
> The new storage engine (CASSANDRA-8099) has caused a regression in write
> performance. This ticket is to address it and bring 3.0 as close to 2.2 as
> possible. There are four main reasons for this I've discovered after much
> toil:
> 1. The cost of calculating the size of a serialized row is higher now since
> we no longer have the cell name and value managed as ByteBuffers as we did
> pre-3.0. That means we current re-serialize the row twice, once to calculate
> the size and once to write the data. This happens during the SSTable writes
> and was addressed in CASSANDRA-9766.
> Double serialization is also happening in CommitLog and the
> MessagingService. We need to apply the same techniques to these as we did to
> the SSTable serialization.
> 2. Even after fixing (1) there is still an issue with there being more GC
> pressure and CPU usage in 3.0 due to the fact that we encode everything from
> the {{Column}} to the {{Row}} to the {{Partition}} as a {{BTree}}.
> Specifically, the {{BTreeSearchIterator}} is used for all iterator() methods.
> Both these classes are useful for efficient removal and searching of the
> trees but in the case of SerDe we almost always want to simply walk the
> entire tree forwards or reversed and apply a function to each element. To
> that end, we can use lambdas and do this without any extra classes.
> 3. We use a lot of thread locals and check them constantly on the read/write
> paths. For client warnings, tracing, temp buffers, etc. We should move all
> thread locals to FastThreadLocals and threads to FastThreadLocalThreads.
> 4. We changed the memtable flusher defaults in 3.2 that caused a regression
> see: CASSANDRA-12228
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)