Re: Shard takeover behavior

Ravikumar Govindarajan Fri, 06 May 2016 00:49:32 -0700

We migrated about 30% of our data in Blur.

But our servers were struggling to index the remaining 60% of data coupled
with new incoming documents for already migrated 30% user-base...


We wanted to isolate migration (data-pumping) from normal indexing/search
requests..

Blur made it very easy for me.

In MasterBasedFactory, I reserve a shard-range for migration & make sure
only a certain set of machines serve it. Once the shards reach a desired
size(15-20GB) , I de-reserve it (Normal machines pick it up & start serving
incoming requests immediately). Newer reserved shards are allocated to
continue our data pumping un-interrupted.

This was a big problem for us & the hidden gems of Blur continue to suprise
& help us so much.

Thanks a lot...

--
Ravi

On Wed, Mar 19, 2014 at 12:07 PM, Ravikumar Govindarajan <
[email protected]> wrote:

> Sure will take up 0.2.2 codebase. Thanks for all your help
>
>
> On Tue, Mar 18, 2014 at 4:24 PM, Aaron McCurry <[email protected]> wrote:
>
>> On Tue, Mar 18, 2014 at 1:30 AM, Ravikumar Govindarajan <
>> [email protected]> wrote:
>>
>> > >
>> > > If I understand this one.  Favor the primary response until a certain
>> > > amount of time has passed then fall back to the secondary response
>> > assuming
>> > > it's available to return.
>> >
>> >
>> > Exactly. This is one such option. Another option is the
>> first-past-the-post
>> >
>> > Buffer cache?  Are you referring to block cache?
>> >
>> >
>> > Yup. Was referring to the block-cache here. But like you said, we can
>> just
>> > let it fall off the LRU
>> >
>> >  The interesting thing here is that Blur is fully committed to disk
>> (HDFS)
>> >
>> > upon each mutate
>> >
>> > I think this is a new feature that I have missed in Blur. Will for sure
>> > check it out. This auto-solves the stale-read issue also
>> >
>> > The problem now is, I am doing quite low-level changes on top of blur.
>> Some
>> > of them are..
>> >
>> > 1. Online Shard-Creation
>> > 2. Externalizing RowId->Shard mapping via BlurPartitioner
>> > 3. Splitting shards upon reaching configured size
>> > 4. Secondary read-only shard for availability...
>> >
>>
>> I would love hear about more of the details of the implementations of
>> these.  :-)
>>
>>
>> >
>> > and many more such stuff needed for our app
>> >
>> > Hope to share and get feedback for these changes from Blur community
>> once
>> > the system survives a couple of production-cycles.
>> >
>>
>> That would be awesome.  Based on your other email, I would strongly
>> recommend you take a look at the 0.2.2 codebase.  It has MANY fixes,
>> performance improvements, and stability enhancements.  Let us know if you
>> have any questions.
>>
>> Aaron
>>
>>
>> >
>> > --
>> > Ravi
>> >
>> >
>> > On Mon, Mar 17, 2014 at 7:17 PM, Aaron McCurry <[email protected]>
>> wrote:
>> >
>> > > On Sat, Mar 15, 2014 at 12:57 PM, Ravikumar Govindarajan <
>> > > [email protected]> wrote:
>> > >
>> > > > Aaron,
>> > > >
>> > > > I was thinking about another way of utilizing read-only shards
>> > > >
>> > > > Instead of logic/intelligence of finding a primary replica
>> > > struggling/down,
>> > > > can we opt for pushing a logic on client-side?
>> > > >
>> > > > We can take a few approaches as below
>> > > >
>> > > > 1. Query both primary/secondary shards in parallel and return which
>> > ever
>> > > > comes first
>> > >
>> > >
>> > > > 2. Query both primary/secondary shards in parallel. Wait for primary
>> > > > response as per configured delay. If not forthcoming, return
>> > secondary's
>> > > > response
>> > > >
>> > >
>> > > If I understand this one.  Favor the primary response until a certain
>> > > amount of time has passed then fall back to the secondary response
>> > assuming
>> > > it's available to return.
>> > >
>> > >
>> > > >
>> > > > These are useful only when client agrees for a "stale-read"
>> scenario.
>> > > > "stale-read" in this case will be the last-commit of the index.
>> > > >
>> > > > What I am aiming at, is in the case of layout-conscious apps [layout
>> > does
>> > > > not change when VM update/crash/hang is restarted], we can always
>> > > fall-back
>> > > > on replica reads, resulting in greater availability but lesser
>> > > consistency
>> > > >
>> > > > A secondary-replica layout need to be present in ZK. Replica-shards
>> > > should
>> > > > be always served from a server other than primary. May be we can
>> > > switch-off
>> > > > buffer-cache for replica reads, as it is used only temporarily
>> > > >
>> > >
>> > > Buffer cache?  Are you referring to block cache?  Or a query cache?
>> Just
>> > > as a FYI, Blur's query cache is currently disabled.  As for the block
>> > > cache, maybe.  The block cache seems to help performance quite a bit
>> and
>> > > usually is does so at little cost.  Also, we could flush the secondary
>> > > shard from the cache from time to time.  Or we could just let it fall
>> out
>> > > of the LRU.
>> > >
>> > >
>> > > >
>> > > > 95% apps queue their indexing operations and can always retry after
>> > > primary
>> > > > comes back online.
>> > > >
>> > >
>> > > The interesting thing here is that Blur is fully committed to disk
>> (HDFS)
>> > > upon each mutate.  So assuming that the secondary shard has refreshed,
>> > the
>> > > primary shard being down just means that you can't write to that
>> shard.
>> > >  Reads should be in the same state.
>> > >
>> > >
>> > > >
>> > > > Please let me know your views on this
>> > > >
>> > >
>> > > I like all these ideas, the only thing I would add is that we we would
>> > need
>> > > to build these sort of options into Blur on a configured per-table
>> basis.
>> > >  The querying both primary and secondary shards at the same time could
>> > > produce the most consistent respond times but at the cost of CPU
>> > resources
>> > > (obviously).
>> > >
>> > > Thanks for the thoughts and ideas!  I like it!
>> > >
>> > > Aaron
>> > >
>> > >
>> > > >
>> > > > --
>> > > > Ravi
>> > > >
>> > > >
>> > > > On Sat, Mar 8, 2014 at 8:56 PM, Aaron McCurry <[email protected]>
>> > > wrote:
>> > > >
>> > > > > On Fri, Mar 7, 2014 at 5:42 AM, Ravikumar Govindarajan <
>> > > > > [email protected]> wrote:
>> > > > >
>> > > > > > >
>> > > > > > > Well it works that way for OOMs and for when the process drop
>> > hard
>> > > > > (Think
>> > > > > > > kill -9).  However when a shard server is shutdown it
>> currently
>> > > ends
>> > > > > it's
>> > > > > > > session in ZooKeeper, thus triggering a layout change.
>> > > > > >
>> > > > > >
>> > > > > > Yes, may be we can have a config to determine whether it shud
>> > > > > end/maintain
>> > > > > > the session in ZK when doing a normal shutdown and then
>> subsequent
>> > > > > restart.
>> > > > > > By this way, both MTTR-conscious and layout-conscious settings
>> can
>> > be
>> > > > > > supported.
>> > > > > >
>> > > > >
>> > > > > That's a neat idea.  Once we have shards being served on multiple
>> > > servers
>> > > > > we should definitely take a look at this.  When we implement the
>> > > > > multi-shard serving I would guess that there will be 2 layout
>> > > strategies
>> > > > > (they might be implemented together).
>> > > > >
>> > > > > 1. Would be to get the N replicas online on different servers.
>> > > > > 2. Would the writing leader for the shard, assuming that it's
>> needed.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > How do you think we can detect that a particular shard-server is
>> > > > > > struggling/shut-down and hence incoming search-requests need to
>> go
>> > to
>> > > > > some
>> > > > > > other server?
>> > > > > >
>> > > > > > I am listing few paths off the top of my head
>> > > > > >
>> > > > > > 1. Process baby-sitters like supervisord, alerting controllers
>> > > > > > 2. Tracking first network-exception in controller and diverting
>> to
>> > > > > > read-only
>> > > > > >     instance. Periodically may be re-try
>> > > > > > 3. Take a statistics based decision, based on previous response
>> > times
>> > > > > etc..
>> > > > > >
>> > > > >
>> > > > > Anding to this one and this may be obvious but measuring the
>> response
>> > > > time
>> > > > > in comparison with other shards.  Meaning if the entire cluster is
>> > > > > experiencing an increase in load and all responses times are
>> > increasing
>> > > > we
>> > > > > wouldn't want to start killing off shard servers inadvertently.
>> > >  Looking
>> > > > > for outliers.
>> > > > >
>> > > > >
>> > > > > > 4. Build some kind of leasing mechanism in ZK etc...
>> > > > > >
>> > > > >
>> > > > > I think that all of these are good approaches.  Likely to
>> determine
>> > > that
>> > > > a
>> > > > > node is misbehaving and should be killed/not used anymore we would
>> > want
>> > > > > multiple ways to measure that condition and then vote on the need
>> > kick
>> > > > out.
>> > > > >
>> > > > >
>> > > > > Aaron
>> > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Ravi
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Mar 7, 2014 at 8:01 AM, Aaron McCurry <
>> [email protected]>
>> > > > > wrote:
>> > > > > >
>> > > > > > > On Thu, Mar 6, 2014 at 6:30 AM, Ravikumar Govindarajan <
>> > > > > > > [email protected]> wrote:
>> > > > > > >
>> > > > > > > > I came to know about zk.session.timeout variable just now,
>> > while
>> > > > > > reading
>> > > > > > > > more about this problem.
>> > > > > > > >
>> > > > > > > > This will only trigger dead-node notification after the
>> > > configured
>> > > > > > > timeout
>> > > > > > > > exceeds. Setting it to 3-4 mins must be fine for OOMs and
>> > > > > > > rolling-restarts.
>> > > > > > > >
>> > > > > > >
>> > > > > > > Well it works that way for OOMs and for when the process drop
>> > hard
>> > > > > (Think
>> > > > > > > kill -9).  However when a shard server is shutdown it
>> currently
>> > > ends
>> > > > > it's
>> > > > > > > session in ZooKeeper, thus triggering a layout change.
>> > > > > > >
>> > > > > > >
>> > > > > > > >
>> > > > > > > > Only extra stuff I am looking for, is to divert search calls
>> > to a
>> > > > > > > read-only
>> > > > > > > > shard instance during this 3-4 mins time to avoid
>> mini-outages
>> > > > > > > >
>> > > > > > >
>> > > > > > > Yes, and I think that the controllers will automatically
>> spread
>> > the
>> > > > > > queries
>> > > > > > > across those servers that are online.  The BlurClient class
>> > already
>> > > > > > takes a
>> > > > > > > list of connection strings and treats all connections as
>> equals.
>> > >  For
>> > > > > > > example, it's current use is to provide the client with all
>> the
>> > > > > > controllers
>> > > > > > > connection strings.  Internally if any one of the controllers
>> > goes
>> > > > down
>> > > > > > or
>> > > > > > > has a network issue another controller is automatically
>> retried
>> > > > without
>> > > > > > the
>> > > > > > > user having to do anything.  There is back off, ping, and
>> pooling
>> > > > logic
>> > > > > > in
>> > > > > > > the BlurClientManager that the BlurClient utilizes.
>> > > > > > >
>> > > > > > > Aaron
>> > > > > > >
>> > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Ravi
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan <
>> > > > > > > > [email protected]> wrote:
>> > > > > > > >
>> > > > > > > > > What do you think of giving an extra leeway for
>> shard-server
>> > > > > >  failover
>> > > > > > > > > cases?
>> > > > > > > > >
>> > > > > > > > > Ex: Whenever a shard-server process gets killed, the
>> > > > > controller-node
>> > > > > > > does
>> > > > > > > > > not immediately update-layout, but rather mark it as a
>> > suspect.
>> > > > > > > > >
>> > > > > > > > > When we have a read-only back-up of shard, searches can
>> > > continue
>> > > > > > > > > unhindered. Indexing during this time can be diverted to a
>> > > queue,
>> > > > > > which
>> > > > > > > > > will store and retry-ops, when shard-server comes online
>> > again.
>> > > > > > > > >
>> > > > > > > > > Over configured number of attempts/time, if the
>> shard-server
>> > > does
>> > > > > not
>> > > > > > > > come
>> > > > > > > > > up, then one controller-server can authoritatively mark
>> it as
>> > > > down
>> > > > > > and
>> > > > > > > > > update the layout.
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Ravi
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Shard takeover behavior

Reply via email to