Re: Massive state-update bottleneck at scale

Scott Blum Wed, 23 Nov 2016 16:03:25 -0800

The second thing I want to look at doing is replacing queued state update
operations with local CAS loops for state format v2 collections, with
in-process collection-level mutex to ensure that a node isn't contending
with itself.  This would only be for state updates, anything more complex
would still go to overseer.


Then at least if a Solr node gets kill -9'd it immediately stops hitting ZK
instead of leaving a bunch of garbage in the queue.  This would require
some changes in ZKStateWriter's assumptions.

On Wed, Nov 23, 2016 at 6:59 PM, Scott Blum <[email protected]> wrote:

> On Wed, Nov 23, 2016 at 5:45 PM, Mark Miller <[email protected]>
>  wrote:
>
>> One thing is, when you reconnect after connecting to ZK, it should now
>> efficiently set every core as down in a single command, not each core.
>
>
> Yeah, I backported downnode, but it still actually takes a long time for
> overseer to execute, and there can be a bunch of these in the queue for the
> same node.
>
> On Wed, Nov 23, 2016 at 5:53 PM, Mark Miller <[email protected]>
> wrote:
>
>> In many cases other nodes need to see a progression of state changes. You
>> really have to clear the deck and try to start from 0.
>
>
> This is exactly the kind of detail I'm looking for.  Can you elaborate?
>
> Unless we can come up with a better idea, my first experiment will be to
> try to eliminate the "DOWN" replica state in all practical cases, relying
> only on careful management of live_nodes presence.  For example, the
> startup sequence (or reconnect sequence) would skip marking replicas down
> and just ensure they're ACTIVE or else put them into RECOVERING, join shard
> leader elections, and finally join live_nodes when that's done.
>
> What land mines am I likely to run into or existing assumptions am I
> likely to violate if I do that?
>

Re: Massive state-update bottleneck at scale

Reply via email to