[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

Mark Miller (JIRA) Thu, 10 May 2018 21:32:32 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471481#comment-16471481
 ]


Mark Miller commented on SOLR-12338:
------------------------------------

{noformat}
+  private OrderedExecutor replayUpdatesExecutor = new OrderedExecutor(
+      Runtime.getRuntime().availableProcessors(),
+      ExecutorUtil.newMDCAwareCachedThreadPool(
+          Runtime.getRuntime().availableProcessors(),
+          new DefaultSolrThreadFactory("replayUpdatesExecutor")));
{noformat}

Given that some machines these days have dozens of cores and you might have 
many SolrCores recovering, we may want to cap the number of threads at some 
number or make it configurable or something.

bq. This seems can be solve by set the fair flag of ArrayBlockingQueue to true

Yeah, you need that to ensure FIFO.

I like how this gives us some control to throttle, I wonder how efficient it is 
as documents keep thundering in though - do we gobble up threads and 
connections waiting? That is where it's a bummer it's hard to limit those 
resources. What are you going to do though? Those requests have to wait 
somewhere or we have to start dropping them - and hopefully with NIO2 it's 
somewhat efficient to wait on IO.

I think what David is getting at is that you are ensuring that tasks are kicked 
off in order, but once they are kicked off, you can't guarantee order. So task1 
gets taken off the queue, then task 2 is taken, now task 2 gets executed first 
when task 1 has it's thread unluckily scheduled by the OS. At least that's how 
I read it. But that is not an issue right? Because you don't run an item from 
the queue until the one in front of it is fully run right?


> Replay buffering tlog in parallel
> ---------------------------------
>
>                 Key: SOLR-12338
>                 URL: https://issues.apache.org/jira/browse/SOLR-12338
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to 
> replay them in parallel. This will significantly reduce recovering time of 
> replicas in high load indexing environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel

Reply via email to