[ 
https://issues.apache.org/jira/browse/SOLR-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998613#comment-15998613
 ] 

Erick Erickson commented on SOLR-10524:
---------------------------------------

So if I were preparing an "executive summary", there would be several 
take-aways:

1> The number of update state operations, i.e. the number of times state is 
actually written to ZK is drastically lower under heavy load; by a factor of 
almost 400!

2> One implication here is that the number of state change notifications that 
ZK has to send out, and thus the number of times the state gets read by Solr 
nodes is _also_ decreased by that same factor. So the fact that the state-read 
operations throughput is the same should be evaluated in light of the fact that 
there will be many fewer of them.

3> One thing not captured by the numbers is that the size of the Overseer queue 
is much less like to spin out of control due to both <2> and the fact that 
we're reading/ordering/processing batches of up to 10,000 messages at once.

4> Even though some of the throughput numbers haven't changed (am_i_leader for 
instance), they'll spend much less time waiting to be carried out due to 1-3. 
Plus only three points may make a circle, but isn't enough data to make a good 
generalization from ;)

Is this fair? Accurate? Complete? I'm looking for something to present to those 
users who have seen the Overseer queue grow to the 100s of K, effectively 
making their cluster unusable.

Thanks for this work! As collections get larger and larger this has become a 
very significant pain-point.

> Explore in-memory partitioning for processing Overseer queue messages
> ---------------------------------------------------------------------
>
>                 Key: SOLR-10524
>                 URL: https://issues.apache.org/jira/browse/SOLR-10524
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>         Attachments: SOLR-10524.patch, SOLR-10524.patch, SOLR-10524.patch, 
> SOLR-10524.patch
>
>
> There are several JIRAs (I'll link in a second) about trying to be more 
> efficient about processing overseer messages as the overseer can become a 
> bottleneck, especially with very large numbers of replicas in a cluster. One 
> of the approaches mentioned near the end of SOLR-5872 (15-Mar) was to "read 
> large no:of items say 10000. put them into in memory buckets and feed them 
> into overseer....".
> This JIRA is to break out that part of the discussion as it might be an easy 
> win whereas "eliminating the Overseer queue" would be quite an undertaking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to