[ 
https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940787#comment-13940787
 ] 

Mark Miller commented on SOLR-5872:
-----------------------------------

bq. Is that dead in the water now?

No. It's got it's own issue, and it seems likely to happen to me.

Even this issue is not "dead in the water". Things are generally determined via 
discussion and consensus. I'm arguing that we should look at simple performance 
bottleneck and improvements to the current system - there seems to be a lot of 
low hanging fruit.

{noformat}
Can you throw some light on how was the ZK schema for your initial impl? If all 
nodes of a given slice is under one zk directory , one watch on the parent 
should be fine, right?
{noformat}

It's been a long time and we had a few variations, so I'd have to go back in 
the code to refresh my memory. For now, from my memory:

Initially I had it to that we simply watched the parent - Loggly ran into 
performance issues with this - even when only one entry changed, they had so 
many entries that updating the state with so many nodes reading so many 
entries, the performance was a big problem for them. They hacked around it 
initially, and then we moved to watching each entry eventually - this made 
small updating state for small changes very efficient. But then another big 
early user was still hitting performance issues simply from having to read so 
many entries on startup and such. This is what prompted the move to a single 
clusterstate.json.

It's hard to remember it all perfectly - the info is spread across and around a 
lot of old JIRAs. Non of the changes were taken lightly, and a variety of 
developers and contributors were generally involved in the discussion or 
motivating changes via their needs.

There are tradeoffs with all of these approaches.

> Eliminate overseer queue 
> -------------------------
>
>                 Key: SOLR-5872
>                 URL: https://issues.apache.org/jira/browse/SOLR-5872
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>
> The overseer queue is one of the busiest points in the entire system. The 
> raison d'ĂȘtre of the queue is
>  * Provide batching of operations for the main clusterstate,json so that 
> state updates are minimized 
> * Avoid race conditions and ensure order
> Now , as we move the individual collection states out of the main 
> clusterstate.json, the batching is not useful anymore.
> Race conditions can easily be solved by using a compare and set in Zookeeper. 
> The proposed solution  is , whenever an operation is required to be performed 
> on the clusterstate, the same thread (and of course the same JVM)
>  # read the fresh state and version of zk node  
>  # construct the new state 
>  # perform a compare and set
>  # if compare and set fails go to step 1
> This should be limited to all operations performed on external collections 
> because batching would be required for others 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to