[
https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010853#comment-17010853
]
Benedict Elliott Smith commented on CASSANDRA-15392:
----------------------------------------------------
MergeIteratorPool:
* As Jordan suggests, storing the pool inside the \{{MergeIterator}} makes
sense - more so because it can avoid incurring the cost of
\{{ThreadLocal.get()}} on \{{returnToPool}}, by stashing a link to an inner
object (that guards put on the identity of the \{{Thread}} owning the queue)
* Perhaps an \{{ArrayList}} is a better idea for collection type, as it will
be generally garbage-free, and occupy less space than a \{{LinkedList}}? An
\{{ArrayDeque}} could maintain the current behaviour, or you could pop from the
back
* Perhaps the queue would be better with fixed size, permitting at most two or
three items to be stored?
* As it happens, I’ve been working on very similar parts of the codebase, and
I have implemented my own \{{TinyThreadLocalPool}} in the patches I hope to
land soon, that does the above but has lower fixed-overhead and less
indirection by permitting at most 3 items, stored in in-line properties (i.e.
\{{val0}}, \{{val1}} and \{{val2}}). It might be worthwhile introducing that
class here for consideration/comparison, and perhaps some amalgamation of the
two used in whichever patch lands first (which I fully expect to be this one)?
{\{MergeIterator.OneToOne}}:
* While we’re here optimising, it probably makes sense to merge \{{OneToOne}}
and \{{TrivialOneToOne}}, simply permitting the reducer in \{{OneToOne}} to be
\{{null}}, and to switch behaviour based on a null check. This should permit
call-sites to become bimorphic that are presently megamorphic, and the branch
should be near perfectly predicted.
{\{MergeIterator.ManyToOne}}:
* Instead of using a separate \{{candidates}} array for reusing a
\{{Candidate}}, could we avoid setting the last heap element to null by instead
placing the now-defunct item there?
* Do we need \{{active}} when it is implied by \{{MergeIterator.iterators !=
null}} and \{{Candidate.iter != null}}?
* Do we need to set primitives to \{{-1}} or \{{false}} on \{{close}}, given
we will throw \{{NullPointerException}} if we haven’t invoked \{{reset}}?
Benchmarking:
* It’s annoying work, but alongside the garbage benchmarks, it would be nice
to see some simple JMH throughput benchmarks introduced, to see what the
overall impact of the change is. I don’t mind helping out here.
nits:
* {\{ManyToOne.DEFAULT_SIZE}} -> \{{ManyToOne.POOLED_MERGE_LIMIT}}?
* +1 Jordan’s nit about naming of \{{getSimple}}
* {\{MergeIteratorPool.queues.initialValue}} should not declare \{{throw
Exception}}
* {\{MergeIteratorPool.initialValue}} looks to be a duplicate
* {\{TokenTree:380}} errant new line
> Pool Merge Iterators
> --------------------
>
> Key: CASSANDRA-15392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15392
> Project: Cassandra
> Issue Type: Sub-task
> Components: Local/Compaction
> Reporter: Blake Eggleston
> Assignee: Blake Eggleston
> Priority: Normal
> Fix For: 4.0
>
>
> By pooling merge iterators, instead of creating new ones each time we need
> them, we can reduce garbage on the compaction and read paths under relevant
> workloads by ~4% in many cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]