Why would other nodes need to see stale state? If they really need intermediate state changes, that sounds like a problem.
wunder Walter Underwood [email protected] http://observer.wunderwood.org/ (my blog) > On Nov 23, 2016, at 2:53 PM, Mark Miller <[email protected]> wrote: > > In many cases other nodes need to see a progression of state changes. You > really have to clear the deck and try to start from 0. > On Wed, Nov 23, 2016 at 5:50 PM Walter Underwood <[email protected] > <mailto:[email protected]>> wrote: > If the queue is local and the state messages are complete, the local queue > should only send the latest, most accurate update. The rest can be skipped. > > The same could be done on the receiving end. Suck the queue dry, then choose > the most recent. > > If the updates depend on previous updates, it would be a lot more work to > compile the latest delta. > > wunder > > Walter Underwood > [email protected] <mailto:[email protected]> > http://observer.wunderwood.org/ <http://observer.wunderwood.org/> (my blog) > > >> On Nov 23, 2016, at 2:45 PM, Mark Miller <[email protected] >> <mailto:[email protected]>> wrote: >> >> I talked about this type of thing with Jessica at Lucene / Sole revolution. >> One thing is, when you reconnect after connecting to ZK, it should now >> efficiently set every core as down in a single command, not each core. >> Beyond that, any single node knows how fast it's sending overseer updates. >> Each should have a governor. If the rate is too high, a node should know >> it's best to just forgive up and assume things are screwed. It could try and >> reset from ground zero. >> >> There are other things things that can be done, but given the current >> design, the simplest win is that a replica can easily prevent itself from >> spamming the overseer queue. >> >> Mark >> On Wed, Nov 23, 2016 at 5:05 PM Scott Blum <[email protected] >> <mailto:[email protected]>> wrote: >> I've been fighting fires the last day where certain of our solr nodes will >> have a long GC pauses that cause them to lose their ZK connection and have >> to reconnect. That would be annoying, but survivable, although obvious it's >> something I want to fix. >> >> But what makes it fatal is the current design of the state update queue. >> >> Every time one of our nodes flaps, it ends up shoving thousands of state >> updates and leader requests onto the queue, most of them ultimately futile. >> By the time the state is actually published, it's already stale. At one >> point we had 400,000 items in the queue and I just had to declare >> bankruptcy, delete the entire queue, and elect a new overseer. Later, we >> had 70,000 items from several flaps that took an hour to churn through. even >> after I'd shut down the problematic nodes. Again, almost entirely useless, >> repetitive work. >> >> Digging through ZKController and related code, the current model just seems >> terribly outdated and non-scalable now. If a node flaps for just a moment, >> do we really need to laboriously update every core's state down, just to >> mark it up again? What purpose does this serve that isn't already served by >> the global live_nodes presence indication and/or leader election nodes? >> >> Rebooting a node creates a similar set of problems, a couple hundred cores >> end up generating thousands of ZK operations to just to back to normal state. >> >> We're at enough of breaking point that I have to do something here for our >> own cluster. I would love to put my head together with some of the more >> knowledgeable Solr operations folks to help redesign something that could >> land in master and improve scalability for everyone. I'd also love to hear >> about any prior art or experiments folks have done. And if there are >> already efforts in process to address this very issue, apologies for being >> out of the loop. >> >> Thanks! >> Scott >> >> -- >> - Mark >> about.me/markrmiller <http://about.me/markrmiller> > -- > - Mark > about.me/markrmiller <http://about.me/markrmiller>
