[
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470518#comment-16470518
]
David Smiley commented on SOLR-12338:
-------------------------------------
This OrderedExecutor thing is nifty. It needs class-level documentation.
I have doubts on the use of a {{new ArrayBlockingQueue<>(1)}} per doc ID hash
bucket. What if the client adds a Runnable for doc1, then immediately adds
another Runnable for doc1. You're intending for the second runnable to block
until the first completes to achieve the per-doc ID serialization. But this may
not happen; a thread may start on the first runnable (which frees up the second
runnable to be submitted), then the thread doesn't get CPU time, and then the
other Runnable zooms ahead out-of-order. See what I mean?
Instead of creating a {{new ArrayBlockingQueue<>(1)}} per doc ID hash bucket,
lets create an array of Locks. When execute() is called, it immediately grabs
the lock, potentially blocking. Then you can submit the provided Runnable with
a wrapping Runnable that unlocks when done. This can be made simpler via using
{{FutureTask}} subclass to override {{done()}}. To be safe, catch a
RejectedExecutionException from execute() to cancel the futuretask. With this
scheme, you might initialize the doc ID hash bucket array size to be larg-ish
at 32, even if there are fewer threads (less accidental hash collision
contention). A Lock is light-weight.
The test uses System.currentTimeMillis() but should probably use nanos which
the JVM guarantees to be sequential?
> Replay buffering tlog in parallel
> ---------------------------------
>
> Key: SOLR-12338
> URL: https://issues.apache.org/jira/browse/SOLR-12338
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Cao Manh Dat
> Assignee: Cao Manh Dat
> Priority: Major
> Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to
> replay them in parallel. This will significantly reduce recovering time of
> replicas in high load indexing environment.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]