[ 
https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890021#comment-15890021
 ] 

albert vico oton edited comment on SOLR-5872 at 3/9/17 11:51 AM:
-----------------------------------------------------------------

Hello, we are currently trying to do a deploy of around 200 collections and 
solrcloud can't handle it, it just  dies due update_status messages propagation 
everytime we try to add a new collection, each collection has 3 replicas, and 
sizes are not very large. Also, I do not see why collection A should be aware 
of collection B state.  

But moving to the topic, overseer node dies since he can not handle all the 
stress due the flooding of messages. IMHO we have here a single point of 
failure in a distributed system, which is not very recommended. 

since it would be useful for big fat shards, my suggestion would be to make 
this optional behavior, so people like us, who need to have a more distributed 
approach, can make use of solrcloud. Since right now it is impossible to. and 
I'm not talking about "thousands" of collections actually with as few as 100 we 
are seeing very bad performance.




was (Author: alvico):
Hello, we are currently trying to do a deploy of around 200 collections and 
solrcloud can't handle it, it just  dies due update_status messages propagation 
everytime we try to add a new collection, each collection has 3 replicas, and 
sizes are not very large. Also, I do not see why collection A should be aware 
of collection B state.  

But moving to the topic, overseer node dies since he can not handle all the 
stress due the flooding of messages. IMHO we have here a single point of 
failure in a distributed system, which is not very recommended. 

since it would be useful for big fat shards, my suggestion would be to make 
this optional behavior, so people like use who need to have a more distributed 
approach can make use of solrcloud. Since right now it is impossible to. and 
I'm not talking about "thousands" of collections actually with as few as 100 we 
are seeing very bad performance.



> Eliminate overseer queue 
> -------------------------
>
>                 Key: SOLR-5872
>                 URL: https://issues.apache.org/jira/browse/SOLR-5872
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>
> The overseer queue is one of the busiest points in the entire system. The 
> raison d'ĂȘtre of the queue is
>  * Provide batching of operations for the main clusterstate,json so that 
> state updates are minimized 
> * Avoid race conditions and ensure order
> Now , as we move the individual collection states out of the main 
> clusterstate.json, the batching is not useful anymore.
> Race conditions can easily be solved by using a compare and set in Zookeeper. 
> The proposed solution  is , whenever an operation is required to be performed 
> on the clusterstate, the same thread (and of course the same JVM)
>  # read the fresh state and version of zk node  
>  # construct the new state 
>  # perform a compare and set
>  # if compare and set fails go to step 1
> This should be limited to all operations performed on external collections 
> because batching would be required for others 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to