[ https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010900#comment-17010900 ]
Benedict Elliott Smith commented on CASSANDRA-15389: ---------------------------------------------------- Thanks, the patch is looking really good. Just some minor suggestions to consider. AbstractRow * If we introduce a {{BiFunction}} variant of {{apply}}, in which we can supply the second argument, then we can go garbage-free for e.g. {{digest}} by calling {{apply(ColumnData::digest, hasher)}} BTreeRow * {{hasComplexDeletion}} does not need the {{if (isSimple)}} branch anymore, I think? * {{hasComplexDeletion}} seems to be testing the wrong sentinel return value from {{accumulate}}? * Does {{HAS_INVALID_DELETIONS}} need to test {{cd != null}}? Row * {{collectStats}} could delegate to a new {{ComplexColumnData.accumulate}}, and pass itself as the parameter (though we would need to test {{!(cd instanceof ComplexColumnData)}} instead of {{cd.column.isSimple()}} * This might mean we can remove the {{hasCells()}} test, as we will be garbage-free and can let {{accumulate}} make the right decision for us BTree * Having been playing with a lot of BTree methods recently, I personally find it aids clarity to have totally separate bits of code implementing behavior for leaves and branches, with a guard at-or-near the start of the method switching behaviour. It could invoke a {{methodLeaf}} variant, or do it inline, but I think it could make it somewhat easier to follow the logic in {{accumulate}}. It’s only a small amount of code to duplicate, but the reader doesn’t have to try and play out the switches while following the flow. * I also find that when iterating branches, it is most clear to simply do one key/child pair per loop, and a final (or initial) unrolled copy of a child visit. Again, a small amount of duplication but I think easier for a reader to digest (at least, I found this helped when reading my own methods). This _should_ also lead to fewer executed instructions, and perhaps better branch predictor behaviour (hard to say, many of them might be well predicted anyway). nits: * {{AbstractRow.validateData}} should probably use method reference for clarity, i.e. {{apply(ColumnData::validate)}} * {{BTreeRow.HAS_INVALID_DELETIONS}} can be declared using lambda syntax * {{BTreeRow}} unused imports * {{BTreeRow:345}} unused {{ALWAYS_TRUE}} * {{Row}} unused imports * {{Rows}} unused {{addColumn}} * {{UnfilteredSerializer}} unused import * {{SerializationHeader}} unused imports * {{BTree}} unused import > Minimize BTree iterator allocations > ----------------------------------- > > Key: CASSANDRA-15389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15389 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction > Reporter: Blake Eggleston > Assignee: Blake Eggleston > Priority: Normal > Fix For: 4.0 > > > Allocations of BTree iterators contribute a lot amount of garbage to the > compaction and read paths. > This patch removes most btree iterator allocations on hot paths by: > • using Row#apply where appropriate on frequently called methods > (Row#digest, Row#validateData > • adding BTree accumulate method. Like the apply method, this method walks > the btree with a function that takes and returns a long argument, this > eliminates iterator allocations without adding helper object allocations > (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize, > BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats, > UnfilteredSerializer#serializedRowBodySize) as well as eliminating the > allocation of helper objects in places where apply was used previously^[1]^. > • Create map of columns in SerializationHeader, this lets us avoid > allocating a btree search iterator for each row we serialize. > These optimizations reduce garbage created during compaction by up to 13.5% > > [1] the memory test does measure memory allocated by lambdas capturing objects -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org