[
https://issues.apache.org/jira/browse/SOLR-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997903#comment-15997903
]
Shalin Shekhar Mangar commented on SOLR-10524:
----------------------------------------------
# The tmp list in the sortItems method should also be a LinkedList otherwise
tmp.remove(0) becomes expensive.
# I ran the OverseerTest#testPerformance method which simulates a worst case
scenario of 20000 mixed collection updates and it shows that {{update_state}}
invocations drop two order of magnitude from 20011 to 131.
# However the overall time does not change that much. Drops from 3m 3s 531ms
without the patch to 2m 53s 282ms. Presumably when real world latencies between
overseer and zk is accounted for, the difference should be larger. I'd like for
us to benchmark this with a remote ZK host to see how much does this patch
increase the overseer throughput.
# This patch process messages in an order different from the state update queue
but always removes the first element. This is wrong and can cause a lot of
problems in the cluster if overseer fails over and restarts processing. We must
remove the message that was processed.
# Also now that the order of processing is different, we must have tests that
assert that the right items are removed from the queue at all times even during
overseer restarts. The bar of testing for this kind of change has to be very
high!
# Is all the re-sorting logic even necessary? It seems that the intention is to
workaround the batching logic inside ZkStateWriter. Why not remove the batching
logic (when switching between collections) from ZkStateWriter altogether? It
will simplify both places.
> Explore in-memory partitioning for processing Overseer queue messages
> ---------------------------------------------------------------------
>
> Key: SOLR-10524
> URL: https://issues.apache.org/jira/browse/SOLR-10524
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Erick Erickson
> Attachments: SOLR-10524.patch, SOLR-10524.patch, SOLR-10524.patch
>
>
> There are several JIRAs (I'll link in a second) about trying to be more
> efficient about processing overseer messages as the overseer can become a
> bottleneck, especially with very large numbers of replicas in a cluster. One
> of the approaches mentioned near the end of SOLR-5872 (15-Mar) was to "read
> large no:of items say 10000. put them into in memory buckets and feed them
> into overseer....".
> This JIRA is to break out that part of the discussion as it might be an easy
> win whereas "eliminating the Overseer queue" would be quite an undertaking.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]