[jira] [Commented] (CASSANDRA-15389) Minimize BTree iterator allocations

Benedict Elliott Smith (Jira) Wed, 04 Dec 2019 11:09:08 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988103#comment-16988103
 ]


Benedict Elliott Smith commented on CASSANDRA-15389:
----------------------------------------------------

{quote}Implementing slice (or findAndAccumulate) would avoid reverse iteration, 
but I don’t think that would make the api or implementation easier to 
understand.
{quote}
Yes, this is what I was proposing. I think it _might_ be easier to understand, 
though I haven’t tried to implement it for comparison. My reasoning goes like 
this:
 * I find it hard to reason about different directions of iteration

 ** I just do, and perhaps others do too?
 ** It’s easy to misread something like direction, so I prefer to use it only 
when it is explicitly relevant, rather than expedient
 ** Making sure we get two methods correct may be harder than getting one 
correct, even if that one is slightly more complicated. Particularly when 
reverse iteration is less commonly reasoned about.
 * Implementation details that are not implied by the method's definition can 
be confusing
 ** {{accumulate}} is logically commutative, so a direction of iteration is a 
surprising implementation detail, which can increase the cognitive load for a 
reader as they may need to puzzle out why
 ** This implementation detail might easily be missed, given assumptions about 
the contract, and the ease with which the parameter switch might be glossed over
 * I find that minimising the bleeding of logic between callers helps clarity
 ** The reverse iteration approach requires overloading the accumulated value 
with a sentinel that means “we’ve gone too far” as well as our result. We 
supply a lambda that’s invoked on this value, and may ask to terminate 
iteration. We finally have to mask out that sentinel when returning, which 
gives three different methods these sentinels are being passed around.
 ** With an upper/lower bound, we decide the total range to look at, i.e. all 
the ColumnData, and simply iterate over it - control flow is unidirectional

With regards to the stop condition, we might want to implement something 
similar to terminate early, but I think it would be fine to supply a terminal 
value to the accumulate method - {{Long.MAX_VALUE}} would be fine for all other 
callers I think? - which would seem conceptually more straightforward to me, 
since it would not be leaking extraneous details like the iteration order into 
sentinel values that need to be unwrapped later. It would only be terminating 
the accumulation because we’ve saturated the output value.

I also don’t think it should be particularly complicating to add range 
operation to the function, and I’m happy to give it a whirl if you like for 
comparison?

I don’t want to belabour the point though, which I may have already done.  I 
think this might be cleaner, but it’s not a given and certainly not something I 
would demand be changed.

> Minimize BTree iterator allocations
> -----------------------------------
>
>                 Key: CASSANDRA-15389
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15389
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local/Compaction
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 4.0
>
>
> Allocations of BTree iterators contribute a lot amount of garbage to the 
> compaction and read paths.
> This patch removes most btree iterator allocations on hot paths by:
>  • using Row#apply where appropriate on frequently called methods 
> (Row#digest, Row#validateData
>  • adding BTree accumulate method. Like the apply method, this method walks 
> the btree with a function that takes and returns a long argument, this 
> eliminates iterator allocations without adding helper object allocations 
> (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize, 
> BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats, 
> UnfilteredSerializer#serializedRowBodySize) as well as eliminating the 
> allocation of helper objects in places where apply was used previously^[1]^.
>  • Create map of columns in SerializationHeader, this lets us avoid 
> allocating a btree search iterator for each row we serialize.
> These optimizations reduce garbage created during compaction by up to 13.5%
>  
> [1] the memory test does measure memory allocated by lambdas capturing objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15389) Minimize BTree iterator allocations

Reply via email to