Benedict Elliott Smith created CASSANDRA-15510:
--------------------------------------------------
Summary: BTree: Improve Building, Inserting and Transforming
Key: CASSANDRA-15510
URL: https://issues.apache.org/jira/browse/CASSANDRA-15510
Project: Cassandra
Issue Type: Improvement
Reporter: Benedict Elliott Smith
Assignee: Benedict Elliott Smith
This work was originally undertaken as a follow-up to CASSANDRA-15367 to ensure
performance is strictly improved, but it may no longer be needed for that
purpose. It’s still hugely impactful, however. It remains to be decided where
this should land.
The current {{BTree}} implementation is suboptimal in a number of ways, with
very little focus having been given to its performance besides its
memory-occupancy. This patch aims to address that, specifically improving the
performance and allocations involved in: building, transforming and inserting
into a tree.
To facilitate this work, the {{BTree}} definition is modified slightly, so that
we can perform some simple arithmetic on tree sizes. Specifically, trees of
depth n are defined to have a maximum capacity of {{branchFactor^n - 1}}, which
translates into capping the number of leaf children at {{branchFactor-1}}, as
opposed to {{branchFactor}}. Since {{branchFactor}} is a power of 2, this
permits fast tree size arithmetic, enabling some of these changes.
h2. Building
The static build method has been modified to utilise dedicated {{buildPerfect}}
methods that build either perfectly dense or perfectly sparse sub-trees. These
perfect trees all share their {{sizeMap}} with each other, and can be built
more efficiently than trees of arbitrary size. The specifics are described in
detail in the comments, but this building block can be used to construct trees
of any size, using at most one child at each level that is not either perfectly
sparse or perfectly dense. Bulk methods are used where possible.
For large trees this can produce up to 30x throughput improvement and 30%
allocation reduction vs 3.0 (TBC, and to be tested vs 4.0).
{{FastBuilder}} is introduced for building a tree in-order (or in reverse)
without duplicate elements to resolve, without necessarily knowing the size
upfront. This meets the needs of most use cases. Data is built directly into
nodes, with up to one already-constructed node, and one partially constructed
node, on each level, being mutated to share their contents in the event of
insufficient data to populate the tree. These builders are thread-locally
shared. These leads to minimal copying, the same sharing of {{sizeMap}} as
above, zero wasted allocations, and results in minimal difference in
performance between utilising the less-ergonomic static build and builder
approach.
For large trees this leads to ~4.5x throughput improvement, and 70% reduction
in allocations vs a normal Builder. For small trees performance is comparable,
but allocations similarly reduced.
h2. Inserting
It turns out that we only ever insert another tree into a tree, so we exploit
this to implement an efficient union of two trees, operating on them directly
via stacks in the transformer, instead of via a collection interface. A
builder-like object is introduced that shares functionality with
{{FastBuilder}}, and permits us to build the result of the union directly into
the final nodes, reusing as much of the original trees as possible. Bulk
methods are used where possible.
The result is not _uniformly_ faster, but is _significantly_ faster on average:
median _improvement_ of 1.4x (that is, 2.4x total throughput), mean improvement
of 10x. Worst reduction is 30%, and it may be that we can isolate and
alleviate that. Allocations are also reduced significantly, with a median of
30% and mean of 42% for the tested workloads. As the trees get larger the
improvement drops, but remains uniformly lower.
h2. Transforming
Transformations garbage overhead is minimal, i.e. the main allocations are
those necessary to represent the new tree. It is significantly faster and
particularly more efficient when removing elements, utilising the shared
functionality of the builder and transformer objects to define an efficient
builder that reuses as much of the original tree as possible.
We also introduce dedicated {{transform}} methods (that forbid returning
{{null}}), and {{BiFunction}} transformations to permit efficient follow-ups.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]