[
https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013292#comment-17013292
]
Benedict Elliott Smith commented on CASSANDRA-15389:
----------------------------------------------------
bq. Good idea about complex data accumulation in collect stats. I ended up
adding and using BiLongAccumulator to make all cases garbage free.
Sounds great! Though looking at the code currently posted, I think it’s still
capturing a reference to the {{collector}}? I think you may have simply not
posted your latest changes.
bq. This seems like it makes sense, but everything may just be looking like a
nail to me, so let me know if you have a better idea
My personal view is it's absolutely fine to duplicate code somewhere like this,
and I’ve done that in some patches I'll be posting soon. I think clarity for
the reader is also valuable, and if the methods are maintained next to each
other and clearly marked as copy/paste variants because we don't have a macro
language, I think that's fine and probably preferable.
However, I think in this case it's fine either way. Looking at your
implementation I realise it’s possible to simply define {{accept(Consumer<V> c,
V v)}} in terms of {{accept(BiConsumer<V1,V2> c, V1 v1, V2 v2)}} by simply
invoking {{accept(Consumer::accept, c, v)}}, which should be more than clear
enough.
It’s a shame this is a recursive function so we won’t be able to benefit from
inlining, and will probably incur an extra virtual invocation cost. It might
be that we anyway win by splitting implementations, but it may not be worth
overthinking.
FWIW, I think it might be helpful to treat {{apply}} to the same restructure
you’ve done to {{accumulate}} - which helped me a great deal in reading that.
{{isStopSentinel}}: do we use {{Long.MIN_VALUE}} anymore? Should we also
consider optionally permitting a stop sentinel to be provided, so we can make
{{minDeletionTime}} more efficient? I don’t think we have anywhere that needs
more than one stop sentinel anymore, so I think it’s probably an acceptable
complication.
nit: {{ComplexColumnData.accumulate}} should we call the parameter e.g.
{{initialValue}} to distinguish it from those {{accumulate}} methods that take
a value to start from in the tree?
> Minimize BTree iterator allocations
> -----------------------------------
>
> Key: CASSANDRA-15389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15389
> Project: Cassandra
> Issue Type: Sub-task
> Components: Local/Compaction
> Reporter: Blake Eggleston
> Assignee: Blake Eggleston
> Priority: Normal
> Fix For: 4.0
>
>
> Allocations of BTree iterators contribute a lot amount of garbage to the
> compaction and read paths.
> This patch removes most btree iterator allocations on hot paths by:
> • using Row#apply where appropriate on frequently called methods
> (Row#digest, Row#validateData
> • adding BTree accumulate method. Like the apply method, this method walks
> the btree with a function that takes and returns a long argument, this
> eliminates iterator allocations without adding helper object allocations
> (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize,
> BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats,
> UnfilteredSerializer#serializedRowBodySize) as well as eliminating the
> allocation of helper objects in places where apply was used previously^[1]^.
> • Create map of columns in SerializationHeader, this lets us avoid
> allocating a btree search iterator for each row we serialize.
> These optimizations reduce garbage created during compaction by up to 13.5%
>
> [1] the memory test does measure memory allocated by lambdas capturing objects
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]