Re: Shard takeover behavior

Ravikumar Govindarajan Sat, 15 Mar 2014 09:58:54 -0700

Aaron,

I was thinking about another way of utilizing read-only shards


Instead of logic/intelligence of finding a primary replica struggling/down,
can we opt for pushing a logic on client-side?

We can take a few approaches as below

1. Query both primary/secondary shards in parallel and return which ever
comes first

2. Query both primary/secondary shards in parallel. Wait for primary
response as per configured delay. If not forthcoming, return secondary's
response

These are useful only when client agrees for a "stale-read" scenario.
"stale-read" in this case will be the last-commit of the index.

What I am aiming at, is in the case of layout-conscious apps [layout does
not change when VM update/crash/hang is restarted], we can always fall-back
on replica reads, resulting in greater availability but lesser consistency

A secondary-replica layout need to be present in ZK. Replica-shards should
be always served from a server other than primary. May be we can switch-off
buffer-cache for replica reads, as it is used only temporarily

95% apps queue their indexing operations and can always retry after primary
comes back online.

Please let me know your views on this

--
Ravi


On Sat, Mar 8, 2014 at 8:56 PM, Aaron McCurry <[email protected]> wrote:

> On Fri, Mar 7, 2014 at 5:42 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > >
> > > Well it works that way for OOMs and for when the process drop hard
> (Think
> > > kill -9).  However when a shard server is shutdown it currently ends
> it's
> > > session in ZooKeeper, thus triggering a layout change.
> >
> >
> > Yes, may be we can have a config to determine whether it shud
> end/maintain
> > the session in ZK when doing a normal shutdown and then subsequent
> restart.
> > By this way, both MTTR-conscious and layout-conscious settings can be
> > supported.
> >
>
> That's a neat idea.  Once we have shards being served on multiple servers
> we should definitely take a look at this.  When we implement the
> multi-shard serving I would guess that there will be 2 layout strategies
> (they might be implemented together).
>
> 1. Would be to get the N replicas online on different servers.
> 2. Would the writing leader for the shard, assuming that it's needed.
>
>
> >
> > How do you think we can detect that a particular shard-server is
> > struggling/shut-down and hence incoming search-requests need to go to
> some
> > other server?
> >
> > I am listing few paths off the top of my head
> >
> > 1. Process baby-sitters like supervisord, alerting controllers
> > 2. Tracking first network-exception in controller and diverting to
> > read-only
> >     instance. Periodically may be re-try
> > 3. Take a statistics based decision, based on previous response times
> etc..
> >
>
> Anding to this one and this may be obvious but measuring the response time
> in comparison with other shards.  Meaning if the entire cluster is
> experiencing an increase in load and all responses times are increasing we
> wouldn't want to start killing off shard servers inadvertently.  Looking
> for outliers.
>
>
> > 4. Build some kind of leasing mechanism in ZK etc...
> >
>
> I think that all of these are good approaches.  Likely to determine that a
> node is misbehaving and should be killed/not used anymore we would want
> multiple ways to measure that condition and then vote on the need kick out.
>
>
> Aaron
>
> >
> > --
> > Ravi
> >
> >
> > On Fri, Mar 7, 2014 at 8:01 AM, Aaron McCurry <[email protected]>
> wrote:
> >
> > > On Thu, Mar 6, 2014 at 6:30 AM, Ravikumar Govindarajan <
> > > [email protected]> wrote:
> > >
> > > > I came to know about zk.session.timeout variable just now, while
> > reading
> > > > more about this problem.
> > > >
> > > > This will only trigger dead-node notification after the configured
> > > timeout
> > > > exceeds. Setting it to 3-4 mins must be fine for OOMs and
> > > rolling-restarts.
> > > >
> > >
> > > Well it works that way for OOMs and for when the process drop hard
> (Think
> > > kill -9).  However when a shard server is shutdown it currently ends
> it's
> > > session in ZooKeeper, thus triggering a layout change.
> > >
> > >
> > > >
> > > > Only extra stuff I am looking for, is to divert search calls to a
> > > read-only
> > > > shard instance during this 3-4 mins time to avoid mini-outages
> > > >
> > >
> > > Yes, and I think that the controllers will automatically spread the
> > queries
> > > across those servers that are online.  The BlurClient class already
> > takes a
> > > list of connection strings and treats all connections as equals.  For
> > > example, it's current use is to provide the client with all the
> > controllers
> > > connection strings.  Internally if any one of the controllers goes down
> > or
> > > has a network issue another controller is automatically retried without
> > the
> > > user having to do anything.  There is back off, ping, and pooling logic
> > in
> > > the BlurClientManager that the BlurClient utilizes.
> > >
> > > Aaron
> > >
> > >
> > > >
> > > > --
> > > > Ravi
> > > >
> > > >
> > > >
> > > > On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan <
> > > > [email protected]> wrote:
> > > >
> > > > > What do you think of giving an extra leeway for shard-server
> >  failover
> > > > > cases?
> > > > >
> > > > > Ex: Whenever a shard-server process gets killed, the
> controller-node
> > > does
> > > > > not immediately update-layout, but rather mark it as a suspect.
> > > > >
> > > > > When we have a read-only back-up of shard, searches can continue
> > > > > unhindered. Indexing during this time can be diverted to a queue,
> > which
> > > > > will store and retry-ops, when shard-server comes online again.
> > > > >
> > > > > Over configured number of attempts/time, if the shard-server does
> not
> > > > come
> > > > > up, then one controller-server can authoritatively mark it as down
> > and
> > > > > update the layout.
> > > > >
> > > > > --
> > > > > Ravi
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Shard takeover behavior

Reply via email to