Re: Shard takeover behavior

Ravikumar Govindarajan Thu, 06 Mar 2014 03:32:14 -0800

I came to know about zk.session.timeout variable just now, while reading
more about this problem.


This will only trigger dead-node notification after the configured timeout
exceeds. Setting it to 3-4 mins must be fine for OOMs and rolling-restarts.

Only extra stuff I am looking for, is to divert search calls to a read-only
shard instance during this 3-4 mins time to avoid mini-outages

--
Ravi



On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan <
[email protected]> wrote:

> What do you think of giving an extra leeway for shard-server  failover
> cases?
>
> Ex: Whenever a shard-server process gets killed, the controller-node does
> not immediately update-layout, but rather mark it as a suspect.
>
> When we have a read-only back-up of shard, searches can continue
> unhindered. Indexing during this time can be diverted to a queue, which
> will store and retry-ops, when shard-server comes online again.
>
> Over configured number of attempts/time, if the shard-server does not come
> up, then one controller-server can authoritatively mark it as down and
> update the layout.
>
> --
> Ravi
>
>

Re: Shard takeover behavior

Reply via email to