[ https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938670#comment-13938670 ]
Jessica Cheng commented on SOLR-5872: ------------------------------------- <quote>For further discussion around the change, there should be background if you search the archives.</quote> If you wouldn't mind terribly, will you please paste the link of a few relevant threads in the archive? (Sorry, I'm not familiar with all the keywords and archives, etc., yet.) <quote>There is a strong argument to be made that we should first investigate the performance issues with the current strategy. ZooKeeper is pretty fast - these state updates are tiny and batched. It seems like we should be able to do a lot better without throwing out code that has been getting hardened for a long time now.</quote> I see where your hesitation is now, and I can definitely agree. Sounds like there are a few points to be investigated for the current system before we attempt to change anything: - Why is the Overseer's so slow at updating cluster state/ What's causing the build-up of queue messages during a restart? - What can we do to generally solve the problem of the Overseer being killed on every instance restart in a rolling bounce? - How much is actually batched? My gut is that for external collections, batching won't be of that much benefit (except for that super-large collection case that Yoink mentioned), but I agree that if the current system can be hardened to work even for those, then the simplicity of one code path should be preferred over ultra-optimizing for a non-issue (assuming the first two points above can be "fixed"). > Eliminate overseer queue > ------------------------- > > Key: SOLR-5872 > URL: https://issues.apache.org/jira/browse/SOLR-5872 > Project: Solr > Issue Type: Improvement > Components: SolrCloud > Reporter: Noble Paul > Assignee: Noble Paul > > The overseer queue is one of the busiest points in the entire system. The > raison d'ĂȘtre of the queue is > * Provide batching of operations for the main clusterstate,json so that > state updates are minimized > * Avoid race conditions and ensure order > Now , as we move the individual collection states out of the main > clusterstate.json, the batching is not useful anymore. > Race conditions can easily be solved by using a compare and set in Zookeeper. > The proposed solution is , whenever an operation is required to be performed > on the clusterstate, the same thread (and of course the same JVM) > # read the fresh state and version of zk node > # construct the new state > # perform a compare and set > # if compare and set fails go to step 1 > This should be limited to all operations performed on external collections > because batching would be required for others -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org