[
https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967627#comment-16967627
]
Benedict Elliott Smith commented on CASSANDRA-15389:
----------------------------------------------------
Just collecting here some comments I made on GitHub:
h3. Rows.collectStats
* Could simply increment the long directly by 0xFFFFFFFF and 1, respectively,
without unpacking
* The saturation checks seem to be of limited value after the loop terminates,
and should perhaps be done on each increment? It would throw if we had an
overflow of 2B to 4B, but not 4B to 6B. Not sure how likely either of these
things are.
* The right-shift to extract should probably be unsigned (though unimportant if
we haven't overflowed)
h3. SerializationHeader
Not sure if this would be an improvement or not, but
{{FullBTreeSearchIterator}} has a rewind method, and this could be hoisted into
{{SearchIterator}} to make it reusable. It's not clear if this would be faster
than consulting a {{HashMap}}, particularly with the new
{{LeafBTreeSearchIterator}} that uses {{binarySearch}} without any optimisation
for the case where we are looking up the same set of values in sequence,
however {{FullBTreeSearchIterator}} would have no indirect memory accesses for
the common case of all (or most) columns being visited, and this could also be
propagated to {{LeafBTreeSearchIterator}}. It would mean fewer indirect memory
accesses.
h3. BTreeRow
* {{hasComplex}} doesn't need to use an iterator at all - we can simply search
for the first complex cell using {{BTree.find}} and the {{Cell}} equivalent of
{{Columns.findFirstComplexIdx}} - however it looks like this method isn't even
used, so we could simply remove it entirely.
* {{hasComplexDeletion}} could use the same logic to determine the
{{firstComplexIdx}}, and instead of providing a {{StopCondition}} we could
provide {{(firstComplexIdx, size)}} as the bounds to accumulate over.
These two would remove the need for a direction to the accumulate function, and
the {{StopCondition}}, which I think would be an easier to understand API (and
easier to parse implementation).
> Minimize BTree iterator allocations
> -----------------------------------
>
> Key: CASSANDRA-15389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15389
> Project: Cassandra
> Issue Type: Sub-task
> Components: Local/Compaction
> Reporter: Blake Eggleston
> Assignee: Blake Eggleston
> Priority: Normal
> Fix For: 4.0
>
>
> Allocations of BTree iterators contribute a lot amount of garbage to the
> compaction and read paths.
> This patch removes most btree iterator allocations on hot paths by:
> • using Row#apply where appropriate on frequently called methods
> (Row#digest, Row#validateData
> • adding BTree accumulate method. Like the apply method, this method walks
> the btree with a function that takes and returns a long argument, this
> eliminates iterator allocations without adding helper object allocations
> (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize,
> BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats,
> UnfilteredSerializer#serializedRowBodySize) as well as eliminating the
> allocation of helper objects in places where apply was used previously^[1]^.
> • Create map of columns in SerializationHeader, this lets us avoid
> allocating a btree search iterator for each row we serialize.
> These optimizations reduce garbage created during compaction by up to 13.5%
>
> [1] the memory test does measure memory allocated by lambdas capturing objects
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]