Re: Shard takeover behavior

Ravikumar Govindarajan Mon, 17 Mar 2014 22:32:31 -0700

>
> If I understand this one.  Favor the primary response until a certain
> amount of time has passed then fall back to the secondary response assuming
> it's available to return.



Exactly. This is one such option. Another option is the first-past-the-post

Buffer cache?  Are you referring to block cache?


Yup. Was referring to the block-cache here. But like you said, we can just
let it fall off the LRU

 The interesting thing here is that Blur is fully committed to disk (HDFS)

upon each mutate

I think this is a new feature that I have missed in Blur. Will for sure
check it out. This auto-solves the stale-read issue also

The problem now is, I am doing quite low-level changes on top of blur. Some
of them are..

1. Online Shard-Creation
2. Externalizing RowId->Shard mapping via BlurPartitioner
3. Splitting shards upon reaching configured size
4. Secondary read-only shard for availability...

and many more such stuff needed for our app

Hope to share and get feedback for these changes from Blur community once
the system survives a couple of production-cycles.

--
Ravi


On Mon, Mar 17, 2014 at 7:17 PM, Aaron McCurry <[email protected]> wrote:

> On Sat, Mar 15, 2014 at 12:57 PM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > Aaron,
> >
> > I was thinking about another way of utilizing read-only shards
> >
> > Instead of logic/intelligence of finding a primary replica
> struggling/down,
> > can we opt for pushing a logic on client-side?
> >
> > We can take a few approaches as below
> >
> > 1. Query both primary/secondary shards in parallel and return which ever
> > comes first
>
>
> > 2. Query both primary/secondary shards in parallel. Wait for primary
> > response as per configured delay. If not forthcoming, return secondary's
> > response
> >
>
> If I understand this one.  Favor the primary response until a certain
> amount of time has passed then fall back to the secondary response assuming
> it's available to return.
>
>
> >
> > These are useful only when client agrees for a "stale-read" scenario.
> > "stale-read" in this case will be the last-commit of the index.
> >
> > What I am aiming at, is in the case of layout-conscious apps [layout does
> > not change when VM update/crash/hang is restarted], we can always
> fall-back
> > on replica reads, resulting in greater availability but lesser
> consistency
> >
> > A secondary-replica layout need to be present in ZK. Replica-shards
> should
> > be always served from a server other than primary. May be we can
> switch-off
> > buffer-cache for replica reads, as it is used only temporarily
> >
>
> Buffer cache?  Are you referring to block cache?  Or a query cache?  Just
> as a FYI, Blur's query cache is currently disabled.  As for the block
> cache, maybe.  The block cache seems to help performance quite a bit and
> usually is does so at little cost.  Also, we could flush the secondary
> shard from the cache from time to time.  Or we could just let it fall out
> of the LRU.
>
>
> >
> > 95% apps queue their indexing operations and can always retry after
> primary
> > comes back online.
> >
>
> The interesting thing here is that Blur is fully committed to disk (HDFS)
> upon each mutate.  So assuming that the secondary shard has refreshed, the
> primary shard being down just means that you can't write to that shard.
>  Reads should be in the same state.
>
>
> >
> > Please let me know your views on this
> >
>
> I like all these ideas, the only thing I would add is that we we would need
> to build these sort of options into Blur on a configured per-table basis.
>  The querying both primary and secondary shards at the same time could
> produce the most consistent respond times but at the cost of CPU resources
> (obviously).
>
> Thanks for the thoughts and ideas!  I like it!
>
> Aaron
>
>
> >
> > --
> > Ravi
> >
> >
> > On Sat, Mar 8, 2014 at 8:56 PM, Aaron McCurry <[email protected]>
> wrote:
> >
> > > On Fri, Mar 7, 2014 at 5:42 AM, Ravikumar Govindarajan <
> > > [email protected]> wrote:
> > >
> > > > >
> > > > > Well it works that way for OOMs and for when the process drop hard
> > > (Think
> > > > > kill -9).  However when a shard server is shutdown it currently
> ends
> > > it's
> > > > > session in ZooKeeper, thus triggering a layout change.
> > > >
> > > >
> > > > Yes, may be we can have a config to determine whether it shud
> > > end/maintain
> > > > the session in ZK when doing a normal shutdown and then subsequent
> > > restart.
> > > > By this way, both MTTR-conscious and layout-conscious settings can be
> > > > supported.
> > > >
> > >
> > > That's a neat idea.  Once we have shards being served on multiple
> servers
> > > we should definitely take a look at this.  When we implement the
> > > multi-shard serving I would guess that there will be 2 layout
> strategies
> > > (they might be implemented together).
> > >
> > > 1. Would be to get the N replicas online on different servers.
> > > 2. Would the writing leader for the shard, assuming that it's needed.
> > >
> > >
> > > >
> > > > How do you think we can detect that a particular shard-server is
> > > > struggling/shut-down and hence incoming search-requests need to go to
> > > some
> > > > other server?
> > > >
> > > > I am listing few paths off the top of my head
> > > >
> > > > 1. Process baby-sitters like supervisord, alerting controllers
> > > > 2. Tracking first network-exception in controller and diverting to
> > > > read-only
> > > >     instance. Periodically may be re-try
> > > > 3. Take a statistics based decision, based on previous response times
> > > etc..
> > > >
> > >
> > > Anding to this one and this may be obvious but measuring the response
> > time
> > > in comparison with other shards.  Meaning if the entire cluster is
> > > experiencing an increase in load and all responses times are increasing
> > we
> > > wouldn't want to start killing off shard servers inadvertently.
>  Looking
> > > for outliers.
> > >
> > >
> > > > 4. Build some kind of leasing mechanism in ZK etc...
> > > >
> > >
> > > I think that all of these are good approaches.  Likely to determine
> that
> > a
> > > node is misbehaving and should be killed/not used anymore we would want
> > > multiple ways to measure that condition and then vote on the need kick
> > out.
> > >
> > >
> > > Aaron
> > >
> > > >
> > > > --
> > > > Ravi
> > > >
> > > >
> > > > On Fri, Mar 7, 2014 at 8:01 AM, Aaron McCurry <[email protected]>
> > > wrote:
> > > >
> > > > > On Thu, Mar 6, 2014 at 6:30 AM, Ravikumar Govindarajan <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > I came to know about zk.session.timeout variable just now, while
> > > > reading
> > > > > > more about this problem.
> > > > > >
> > > > > > This will only trigger dead-node notification after the
> configured
> > > > > timeout
> > > > > > exceeds. Setting it to 3-4 mins must be fine for OOMs and
> > > > > rolling-restarts.
> > > > > >
> > > > >
> > > > > Well it works that way for OOMs and for when the process drop hard
> > > (Think
> > > > > kill -9).  However when a shard server is shutdown it currently
> ends
> > > it's
> > > > > session in ZooKeeper, thus triggering a layout change.
> > > > >
> > > > >
> > > > > >
> > > > > > Only extra stuff I am looking for, is to divert search calls to a
> > > > > read-only
> > > > > > shard instance during this 3-4 mins time to avoid mini-outages
> > > > > >
> > > > >
> > > > > Yes, and I think that the controllers will automatically spread the
> > > > queries
> > > > > across those servers that are online.  The BlurClient class already
> > > > takes a
> > > > > list of connection strings and treats all connections as equals.
>  For
> > > > > example, it's current use is to provide the client with all the
> > > > controllers
> > > > > connection strings.  Internally if any one of the controllers goes
> > down
> > > > or
> > > > > has a network issue another controller is automatically retried
> > without
> > > > the
> > > > > user having to do anything.  There is back off, ping, and pooling
> > logic
> > > > in
> > > > > the BlurClientManager that the BlurClient utilizes.
> > > > >
> > > > > Aaron
> > > > >
> > > > >
> > > > > >
> > > > > > --
> > > > > > Ravi
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > What do you think of giving an extra leeway for shard-server
> > > >  failover
> > > > > > > cases?
> > > > > > >
> > > > > > > Ex: Whenever a shard-server process gets killed, the
> > > controller-node
> > > > > does
> > > > > > > not immediately update-layout, but rather mark it as a suspect.
> > > > > > >
> > > > > > > When we have a read-only back-up of shard, searches can
> continue
> > > > > > > unhindered. Indexing during this time can be diverted to a
> queue,
> > > > which
> > > > > > > will store and retry-ops, when shard-server comes online again.
> > > > > > >
> > > > > > > Over configured number of attempts/time, if the shard-server
> does
> > > not
> > > > > > come
> > > > > > > up, then one controller-server can authoritatively mark it as
> > down
> > > > and
> > > > > > > update the layout.
> > > > > > >
> > > > > > > --
> > > > > > > Ravi
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Shard takeover behavior

Reply via email to