[ 
https://issues.apache.org/jira/browse/SOLR-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997903#comment-15997903
 ] 

Shalin Shekhar Mangar commented on SOLR-10524:
----------------------------------------------


# The tmp list in the sortItems method should also be a LinkedList otherwise 
tmp.remove(0) becomes expensive.
# I ran the OverseerTest#testPerformance method which simulates a worst case 
scenario of 20000 mixed collection updates and it shows that {{update_state}} 
invocations drop two order of magnitude from 20011 to 131.
# However the overall time does not change that much. Drops from 3m 3s 531ms 
without the patch to 2m 53s 282ms. Presumably when real world latencies between 
overseer and zk is accounted for, the difference should be larger. I'd like for 
us to benchmark this with a remote ZK host to see how much does this patch 
increase the overseer throughput.
# This patch process messages in an order different from the state update queue 
but always removes the first element. This is wrong and can cause a lot of 
problems in the cluster if overseer fails over and restarts processing. We must 
remove the message that was processed.
# Also now that the order of processing is different, we must have tests that 
assert that the right items are removed from the queue at all times even during 
overseer restarts. The bar of testing for this kind of change has to be very 
high!
# Is all the re-sorting logic even necessary? It seems that the intention is to 
workaround the batching logic inside ZkStateWriter. Why not remove the batching 
logic (when switching between collections) from ZkStateWriter altogether? It 
will simplify both places.

> Explore in-memory partitioning for processing Overseer queue messages
> ---------------------------------------------------------------------
>
>                 Key: SOLR-10524
>                 URL: https://issues.apache.org/jira/browse/SOLR-10524
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>         Attachments: SOLR-10524.patch, SOLR-10524.patch, SOLR-10524.patch
>
>
> There are several JIRAs (I'll link in a second) about trying to be more 
> efficient about processing overseer messages as the overseer can become a 
> bottleneck, especially with very large numbers of replicas in a cluster. One 
> of the approaches mentioned near the end of SOLR-5872 (15-Mar) was to "read 
> large no:of items say 10000. put them into in memory buckets and feed them 
> into overseer....".
> This JIRA is to break out that part of the discussion as it might be an easy 
> win whereas "eliminating the Overseer queue" would be quite an undertaking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to