[ 
https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938687#comment-13938687
 ] 

Mark Miller commented on SOLR-5872:
-----------------------------------

{noformat}
Some things that are in the clusterstate now and that could be in the future 
are not so easily handled with the non overseer strategy - like marking who is 
the leader. You have to have the Overseer running its own special thread to 
inject and remove information.
{noformat}

To expand on this one a bit - you can obviously have each node essentially do 
what the overseer does now - to know the true shard leader that means things 
like going to ZooKeeper though - so for a large cluster, as each node takes on 
all the duties of the overseer and every node is now hitting zookeper for this 
and that, and then each node is trying update the clusterstate.json at the same 
time and retrying, and you have this contentious herd pilling onto this one 
zookeeper node.

The Overseer was seen as a fairly elegant way to avoid this herd effect and 
provide a less chatty solution. Rather than all the retries and reading the 
state on every state change, everyone writes to a non contentious zk node, the 
Overseer batches up the info and writes out the state.

Now if we cannot make it fast enough because of fundamental limitations, that 
is one thing. But gosh, on the surface, these state updates are so small and ZK 
is fairly performant...

We should identify the bottlenecks.

For startup, one random idea is to look at using zk's multi call support to 
read the whole queue in one request and then batch it all.

I've got some other common sense ideas as well, but will have to find out the 
choke points before it makes a lot of sense brainstorming solutions.

> Eliminate overseer queue 
> -------------------------
>
>                 Key: SOLR-5872
>                 URL: https://issues.apache.org/jira/browse/SOLR-5872
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>
> The overseer queue is one of the busiest points in the entire system. The 
> raison d'ĂȘtre of the queue is
>  * Provide batching of operations for the main clusterstate,json so that 
> state updates are minimized 
> * Avoid race conditions and ensure order
> Now , as we move the individual collection states out of the main 
> clusterstate.json, the batching is not useful anymore.
> Race conditions can easily be solved by using a compare and set in Zookeeper. 
> The proposed solution  is , whenever an operation is required to be performed 
> on the clusterstate, the same thread (and of course the same JVM)
>  # read the fresh state and version of zk node  
>  # construct the new state 
>  # perform a compare and set
>  # if compare and set fails go to step 1
> This should be limited to all operations performed on external collections 
> because batching would be required for others 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to