[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

David Smiley (JIRA) Thu, 10 May 2018 08:06:43 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470518#comment-16470518
 ]


David Smiley commented on SOLR-12338:
-------------------------------------

This OrderedExecutor thing is nifty. It needs class-level documentation.
 I have doubts on the use of a {{new ArrayBlockingQueue<>(1)}} per doc ID hash 
bucket. What if the client adds a Runnable for doc1, then immediately adds 
another Runnable for doc1. You're intending for the second runnable to block 
until the first completes to achieve the per-doc ID serialization. But this may 
not happen; a thread may start on the first runnable (which frees up the second 
runnable to be submitted), then the thread doesn't get CPU time, and then the 
other Runnable zooms ahead out-of-order. See what I mean?

Instead of creating a {{new ArrayBlockingQueue<>(1)}} per doc ID hash bucket, 
lets create an array of Locks. When execute() is called, it immediately grabs 
the lock, potentially blocking. Then you can submit the provided Runnable with 
a wrapping Runnable that unlocks when done. This can be made simpler via using 
{{FutureTask}} subclass to override {{done()}}.  To be safe, catch a 
RejectedExecutionException from execute() to cancel the futuretask.  With this 
scheme, you might initialize the doc ID hash bucket array size to be larg-ish 
at 32, even if there are fewer threads (less accidental hash collision 
contention).  A Lock is light-weight.

The test uses System.currentTimeMillis() but should probably use nanos which 
the JVM guarantees to be sequential?

> Replay buffering tlog in parallel
> ---------------------------------
>
>                 Key: SOLR-12338
>                 URL: https://issues.apache.org/jira/browse/SOLR-12338
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

Reply via email to