[ 
https://issues.apache.org/jira/browse/SOLR-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194970#comment-16194970
 ] 

Shawn Heisey commented on SOLR-10265:
-------------------------------------

Guava has something really cool that I think we could use for a multi-threaded 
overseer, and it is available in the 14.0.1 version that Solr currently 
includes:

https://google.github.io/guava/releases/14.0/api/docs/com/google/common/util/concurrent/Striped.html

If we use the name of the collection as the stripe input, then we will have 
individual locks for each collection, so we can guarantee order of operations.  
If there are any Overseer operations that are cluster-wide and not applicable 
to one collection, we probably need to come up with a special name for those.

NB: It's possible that different collection names might hash to the same stripe 
and therefore block each other, but that would hopefully be a rare occurrence.

Probably what should happen is that each thread would grab an operation from 
the queue, and attempt to acquire the lock for the collection mentioned in that 
operation.  If a ton of operations happen that all apply to one collection, 
then it would still act like the current single-thread implementation.  One 
thing that I do not know is whether the Java "Lock" implementation (which is 
what is obtained from the Striped class) guarantees what order the threads will 
be granted the lock.  The hope is of course that locks will be granted in the 
order they are requested when several threads are all attempting to use the 
same lock.

> Overseer can become the bottleneck in very large clusters
> ---------------------------------------------------------
>
>                 Key: SOLR-10265
>                 URL: https://issues.apache.org/jira/browse/SOLR-10265
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>
> Let's say we have a large cluster. Some numbers:
> - To ingest the data at the volume we want to I need roughly a 600 shard 
> collection.
> - Index into the collection for 1 hour and then create a new collection 
> - For a 30 days retention window with these numbers we would end up wth  
> ~400k cores in the cluster
> - Just a rolling restart of this cluster can take hours because the overseer 
> queue gets backed up. If a few nodes looses connectivity to ZooKeeper then 
> also we can end up with lots of messages in the Overseer queue
> With some tests here are the two high level problems we have identified:
> 1> How fast can the overseer process operations:
> The rate at which the overseer processes events is too slow at this scale. 
> I ran {{OverseerTest#testPerformance}} which creates 10 collections ( 1 shard 
> 1 replica ) and generates 20k state change events. The test took 119 seconds 
> to run on my machine which means ~170 events a second. Let's say a server can 
> process 5x of my machine so 1k events a second. 
> Total events generated by a 400k replica cluster = 400k * 4 ( state changes 
> till replica become active ) = 1.6M / 1k events a second will be 1600 minutes.
> Second observation was that the rate at which the overseer can process events 
> slows down when the number of items in the queue gets larger
> I ran the same {{OverseerTest#testPerformance}} but changed the number of 
> events generated to 2000 instead. The test took only 5 seconds to run. So it 
> was a lot faster than the test run which generated 20k events
> 2> State changes overwhelming ZK:
> For every state change Solr is writing out a big state.json to zookeeper. 
> This can lead to the zookeeper transaction logs going out of control even 
> with auto purging etc set . 
> I haven't debugged why the transaction logs ran into terabytes without taking 
> into snapshots but this was my assumption based on the other problems we 
> observed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to