I didn't say stale state though actually. I said state progressions. On Wed, Nov 23, 2016 at 6:03 PM Mark Miller <[email protected]> wrote:
> Because that's how it works. > On Wed, Nov 23, 2016 at 5:57 PM Walter Underwood <[email protected]> > wrote: > > Why would other nodes need to see stale state? > > If they really need intermediate state changes, that sounds like a problem. > > wunder > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > > On Nov 23, 2016, at 2:53 PM, Mark Miller <[email protected]> wrote: > > In many cases other nodes need to see a progression of state changes. You > really have to clear the deck and try to start from 0. > On Wed, Nov 23, 2016 at 5:50 PM Walter Underwood <[email protected]> > wrote: > > If the queue is local and the state messages are complete, the local queue > should only send the latest, most accurate update. The rest can be skipped. > > The same could be done on the receiving end. Suck the queue dry, then > choose the most recent. > > If the updates depend on previous updates, it would be a lot more work to > compile the latest delta. > > wunder > > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > > On Nov 23, 2016, at 2:45 PM, Mark Miller <[email protected]> wrote: > > I talked about this type of thing with Jessica at Lucene / Sole > revolution. One thing is, when you reconnect after connecting to ZK, it > should now efficiently set every core as down in a single command, not each > core. Beyond that, any single node knows how fast it's sending overseer > updates. Each should have a governor. If the rate is too high, a node > should know it's best to just forgive up and assume things are screwed. It > could try and reset from ground zero. > > There are other things things that can be done, but given the current > design, the simplest win is that a replica can easily prevent itself from > spamming the overseer queue. > > Mark > On Wed, Nov 23, 2016 at 5:05 PM Scott Blum <[email protected]> wrote: > > I've been fighting fires the last day where certain of our solr nodes will > have a long GC pauses that cause them to lose their ZK connection and have > to reconnect. That would be annoying, but survivable, although obvious > it's something I want to fix. > > But what makes it fatal is the current design of the state update queue. > > Every time one of our nodes flaps, it ends up shoving thousands of state > updates and leader requests onto the queue, most of them ultimately > futile. By the time the state is actually published, it's already stale. > At one point we had 400,000 items in the queue and I just had to declare > bankruptcy, delete the entire queue, and elect a new overseer. Later, we > had 70,000 items from several flaps that took an hour to churn through. > even after I'd shut down the problematic nodes. Again, almost entirely > useless, repetitive work. > > Digging through ZKController and related code, the current model just > seems terribly outdated and non-scalable now. If a node flaps for just a > moment, do we really need to laboriously update every core's state down, > just to mark it up again? What purpose does this serve that isn't already > served by the global live_nodes presence indication and/or leader election > nodes? > > Rebooting a node creates a similar set of problems, a couple hundred cores > end up generating thousands of ZK operations to just to back to normal > state. > > We're at enough of breaking point that I *have* to do something here for > our own cluster. I would love to put my head together with some of the > more knowledgeable Solr operations folks to help redesign something that > could land in master and improve scalability for everyone. I'd also love > to hear about any prior art or experiments folks have done. And if there > are already efforts in process to address this very issue, apologies for > being out of the loop. > > Thanks! > Scott > > -- > - Mark > about.me/markrmiller > > > -- > - Mark > about.me/markrmiller > > > -- > - Mark > about.me/markrmiller > -- - Mark about.me/markrmiller
