[
https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jordan West updated CASSANDRA-15392:
------------------------------------
Status: Changes Suggested (was: Review In Progress)
MergeIteratorPool:
* #initialValue() isn’t used or at a minimum doesn’t need {{throws
Exception}}
MergeIterator:
* {{#returnToPool()}}: Instead of hard-coding each concrete class to their
specific pools, consider adding a pool reference to MergeIterator and defining
{{#returnToPool()}} as {{(if (pool != null) pool.put(this))}}. The only
downside I can see here is the additional field will add to the memory overhead
of MergeIterator which is antithesis to this patch and it already adds a field.
On the plus side it could make future tests easier, reduces reliance on more
statics, and cleans up the code a bit. It would also allow for a version where
we don’t rely on type erasure in the future.
* Really minor nit: the naming of {{#getSimple()}} made me think of
{{TrivialOneToOne}} iterators since they are synonymous. Consider
{{getInternal() }}/ {{getFromPool() }}/ {{getPooled()}} instead.
* ManyToOne.DEFAULT_SIZE could be tunable so if we do find a pathological
workload in the future it doesn’t require a re-compile to address
* If my understanding is correct, we keep the {{candidates}} array in
{{ManyToOne}} so we can {{#close()}} them all so they can be recycled. If thats
true, I think it may be possible to get rid of the {{candidates}} array in
{{ManyToOne}} by doing the following: 1) Change Candidate#advance to return a
boolean. 2) Pass that boolean to {{ManyToOne#replaceAndSink}} to use in the
first if block — where {{Candidate#close()}} is called. This would allow the
remaining heap to be iterated over, instead of the {{candidates}} array in
{{ManyToOne#close()}}.
SASI Changes:
* While I don’t see anything wrong with these changes they might detract from
the efficacy of this patch when SASI is being used. In particular, the
MergeIterator created by OnDiskToken#iterator() is almost always of size 1 (its
only >=2 if there is a DecoratedKey hash collision). This means we are almost
always wasting a {{TrivialOneToOne}} iterator when we don’t need one and in the
case where there is a collision we will use a {{ManyToOne}} whose
{{candidates}} array is 30 elements too large. I’d add a {{if (keys.size() ==
1) returns keys.get(0)}} and bypass the pool for the rare case a collision
causes a {{ManyToOne}} to be instantiated.
Other:
* I can imagine an uncommon workloads / configuration that compacted and read
infrequently that could benefit from the queues being able to shrink. Do you
think its worth exploring or benchmarking?
> Pool Merge Iterators
> --------------------
>
> Key: CASSANDRA-15392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15392
> Project: Cassandra
> Issue Type: Sub-task
> Components: Local/Compaction
> Reporter: Blake Eggleston
> Assignee: Blake Eggleston
> Priority: Normal
> Fix For: 4.0
>
>
> By pooling merge iterators, instead of creating new ones each time we need
> them, we can reduce garbage on the compaction and read paths under relevant
> workloads by ~4% in many cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]