[ https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938539#comment-13938539 ]
Mark Miller commented on SOLR-5872: ----------------------------------- bq. With the overseer queues, each state update is 4+ zookeeper writes Given the numbers I've seen published for ZK performance, it seems like that should not be a big deal in typical cases? bq. Empirically, we have definitely seen the workqueue back up with lots of items during a node bounce I'm not surprised - most of this code has not been optimized or investigated thoroughly. The original author of a lot of the Overseer code has moved on and it likely has not seen as much attention as would be nice over the past year. Until someone looks into the current issues closely though, it seems hard to recommend rewriting this whole very important piece. bq. If batching really is so important, there's no batching for external collection state updates. I'm not really fully up on "external collections" but AFAIK it's part of some other work to support tons of collections that I'm not fully sold on yet either :) bq. In a "normal" rolling bounce where instances are restarted one-by-one, in the same order each time, the Overseer is killed at each instance restart, thus hindering the recovery process by gating state transition. This points out another issue that we might be able to address. Without having looked closely at the issues brought up (and I don't see evidence anyone else has either), it's hard to draw the conclusion the whole thing just has to be replaced yet. A couple issues around the old implementation: * With every node updating the whole cluster state on state change, the clusterstate.json file is read far too much. The workaround you guys are proposing for that appears to be only having clients update the clusterstate when they run into an error - but I'm not sold that that is the best architecture for the future either. That's a complicated change to make, with many ramifications for future development. * Some things that are in the clusterstate now and that could be in the future are not so easily handled with the non overseer strategy - like marking who is the leader. You have to have the Overseer running its own special thread to inject and remove information. * As things are, on something like cluster startup, there will be tons of reads and writes of the clusterstate.json - a flood of attempts and retries to update it in ZooKeeper. For further discussion around the change, there should be background if you search the archives. There is a strong argument to be made that we should first investigate the performance issues with the current strategy. ZooKeeper is pretty fast - these state updates are tiny and batched. It seems like we should be able to do a lot better without throwing out code that has been getting hardened for a long time now. > Eliminate overseer queue > ------------------------- > > Key: SOLR-5872 > URL: https://issues.apache.org/jira/browse/SOLR-5872 > Project: Solr > Issue Type: Improvement > Components: SolrCloud > Reporter: Noble Paul > Assignee: Noble Paul > > The overseer queue is one of the busiest points in the entire system. The > raison d'ĂȘtre of the queue is > * Provide batching of operations for the main clusterstate,json so that > state updates are minimized > * Avoid race conditions and ensure order > Now , as we move the individual collection states out of the main > clusterstate.json, the batching is not useful anymore. > Race conditions can easily be solved by using a compare and set in Zookeeper. > The proposed solution is , whenever an operation is required to be performed > on the clusterstate, the same thread (and of course the same JVM) > # read the fresh state and version of zk node > # construct the new state > # perform a compare and set > # if compare and set fails go to step 1 > This should be limited to all operations performed on external collections > because batching would be required for others -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org