Benedict Elliott Smith created CASSANDRA-15510:
--------------------------------------------------

             Summary: BTree: Improve Building, Inserting and Transforming
                 Key: CASSANDRA-15510
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15510
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Benedict Elliott Smith
            Assignee: Benedict Elliott Smith


This work was originally undertaken as a follow-up to CASSANDRA-15367 to ensure 
performance is strictly improved, but it may no longer be needed for that 
purpose.  It’s still hugely impactful, however.  It remains to be decided where 
this should land.

The current {{BTree}} implementation is suboptimal in a number of ways, with 
very little focus having been given to its performance besides its 
memory-occupancy.  This patch aims to address that, specifically improving the 
performance and allocations involved in: building, transforming and inserting 
into a tree.

To facilitate this work, the {{BTree}} definition is modified slightly, so that 
we can perform some simple arithmetic on tree sizes.  Specifically, trees of 
depth n are defined to have a maximum capacity of {{branchFactor^n - 1}}, which 
translates into capping the number of leaf children at {{branchFactor-1}}, as 
opposed to {{branchFactor}}.  Since {{branchFactor}} is a power of 2, this 
permits fast tree size arithmetic, enabling some of these changes.

h2. Building
The static build method has been modified to utilise dedicated {{buildPerfect}} 
methods that build either perfectly dense or perfectly sparse sub-trees.  These 
perfect trees all share their {{sizeMap}} with each other, and can be built 
more efficiently than trees of arbitrary size.  The specifics are described in 
detail in the comments, but this building block can be used to construct trees 
of any size, using at most one child at each level that is not either perfectly 
sparse or perfectly dense.  Bulk methods are used where possible.

For large trees this can produce up to 30x throughput improvement and 30% 
allocation reduction vs 3.0 (TBC, and to be tested vs 4.0).

{{FastBuilder}} is introduced for building a tree in-order (or in reverse) 
without duplicate elements to resolve, without necessarily knowing the size 
upfront.  This meets the needs of most use cases.  Data is built directly into 
nodes, with up to one already-constructed node, and one partially constructed 
node, on each level, being mutated to share their contents in the event of 
insufficient data to populate the tree.  These builders are thread-locally 
shared.  These leads to minimal copying, the same sharing of {{sizeMap}} as 
above, zero wasted allocations, and results in minimal difference in 
performance between utilising the less-ergonomic static build and builder 
approach.

For large trees this leads to ~4.5x throughput improvement, and 70% reduction 
in allocations vs a normal Builder.  For small trees performance is comparable, 
but allocations similarly reduced.

h2. Inserting
It turns out that we only ever insert another tree into a tree, so we exploit 
this to implement an efficient union of two trees, operating on them directly 
via stacks in the transformer, instead of via a collection interface.  A 
builder-like object is introduced that shares functionality with 
{{FastBuilder}}, and permits us to build the result of the union directly into 
the final nodes, reusing as much of the original trees as possible.  Bulk 
methods are used where possible.

The result is not _uniformly_ faster, but is _significantly_ faster on average: 
median _improvement_ of 1.4x (that is, 2.4x total throughput), mean improvement 
of 10x.  Worst reduction is 30%, and it may be that we can isolate and 
alleviate that.  Allocations are also reduced significantly, with a median of 
30% and mean of 42% for the tested workloads.  As the trees get larger the 
improvement drops, but remains uniformly lower.

h2. Transforming
Transformations garbage overhead is minimal, i.e. the main allocations are 
those necessary to represent the new tree.  It is significantly faster and 
particularly more efficient when removing elements, utilising the shared 
functionality of the builder and transformer objects to define an efficient 
builder that reuses as much of the original tree as possible. 

We also introduce dedicated {{transform}} methods (that forbid returning 
{{null}}), and {{BiFunction}} transformations to permit efficient follow-ups.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to