Aaron, I was thinking about another way of utilizing read-only shards
Instead of logic/intelligence of finding a primary replica struggling/down, can we opt for pushing a logic on client-side? We can take a few approaches as below 1. Query both primary/secondary shards in parallel and return which ever comes first 2. Query both primary/secondary shards in parallel. Wait for primary response as per configured delay. If not forthcoming, return secondary's response These are useful only when client agrees for a "stale-read" scenario. "stale-read" in this case will be the last-commit of the index. What I am aiming at, is in the case of layout-conscious apps [layout does not change when VM update/crash/hang is restarted], we can always fall-back on replica reads, resulting in greater availability but lesser consistency A secondary-replica layout need to be present in ZK. Replica-shards should be always served from a server other than primary. May be we can switch-off buffer-cache for replica reads, as it is used only temporarily 95% apps queue their indexing operations and can always retry after primary comes back online. Please let me know your views on this -- Ravi On Sat, Mar 8, 2014 at 8:56 PM, Aaron McCurry <[email protected]> wrote: > On Fri, Mar 7, 2014 at 5:42 AM, Ravikumar Govindarajan < > [email protected]> wrote: > > > > > > > Well it works that way for OOMs and for when the process drop hard > (Think > > > kill -9). However when a shard server is shutdown it currently ends > it's > > > session in ZooKeeper, thus triggering a layout change. > > > > > > Yes, may be we can have a config to determine whether it shud > end/maintain > > the session in ZK when doing a normal shutdown and then subsequent > restart. > > By this way, both MTTR-conscious and layout-conscious settings can be > > supported. > > > > That's a neat idea. Once we have shards being served on multiple servers > we should definitely take a look at this. When we implement the > multi-shard serving I would guess that there will be 2 layout strategies > (they might be implemented together). > > 1. Would be to get the N replicas online on different servers. > 2. Would the writing leader for the shard, assuming that it's needed. > > > > > > How do you think we can detect that a particular shard-server is > > struggling/shut-down and hence incoming search-requests need to go to > some > > other server? > > > > I am listing few paths off the top of my head > > > > 1. Process baby-sitters like supervisord, alerting controllers > > 2. Tracking first network-exception in controller and diverting to > > read-only > > instance. Periodically may be re-try > > 3. Take a statistics based decision, based on previous response times > etc.. > > > > Anding to this one and this may be obvious but measuring the response time > in comparison with other shards. Meaning if the entire cluster is > experiencing an increase in load and all responses times are increasing we > wouldn't want to start killing off shard servers inadvertently. Looking > for outliers. > > > > 4. Build some kind of leasing mechanism in ZK etc... > > > > I think that all of these are good approaches. Likely to determine that a > node is misbehaving and should be killed/not used anymore we would want > multiple ways to measure that condition and then vote on the need kick out. > > > Aaron > > > > > -- > > Ravi > > > > > > On Fri, Mar 7, 2014 at 8:01 AM, Aaron McCurry <[email protected]> > wrote: > > > > > On Thu, Mar 6, 2014 at 6:30 AM, Ravikumar Govindarajan < > > > [email protected]> wrote: > > > > > > > I came to know about zk.session.timeout variable just now, while > > reading > > > > more about this problem. > > > > > > > > This will only trigger dead-node notification after the configured > > > timeout > > > > exceeds. Setting it to 3-4 mins must be fine for OOMs and > > > rolling-restarts. > > > > > > > > > > Well it works that way for OOMs and for when the process drop hard > (Think > > > kill -9). However when a shard server is shutdown it currently ends > it's > > > session in ZooKeeper, thus triggering a layout change. > > > > > > > > > > > > > > Only extra stuff I am looking for, is to divert search calls to a > > > read-only > > > > shard instance during this 3-4 mins time to avoid mini-outages > > > > > > > > > > Yes, and I think that the controllers will automatically spread the > > queries > > > across those servers that are online. The BlurClient class already > > takes a > > > list of connection strings and treats all connections as equals. For > > > example, it's current use is to provide the client with all the > > controllers > > > connection strings. Internally if any one of the controllers goes down > > or > > > has a network issue another controller is automatically retried without > > the > > > user having to do anything. There is back off, ping, and pooling logic > > in > > > the BlurClientManager that the BlurClient utilizes. > > > > > > Aaron > > > > > > > > > > > > > > -- > > > > Ravi > > > > > > > > > > > > > > > > On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan < > > > > [email protected]> wrote: > > > > > > > > > What do you think of giving an extra leeway for shard-server > > failover > > > > > cases? > > > > > > > > > > Ex: Whenever a shard-server process gets killed, the > controller-node > > > does > > > > > not immediately update-layout, but rather mark it as a suspect. > > > > > > > > > > When we have a read-only back-up of shard, searches can continue > > > > > unhindered. Indexing during this time can be diverted to a queue, > > which > > > > > will store and retry-ops, when shard-server comes online again. > > > > > > > > > > Over configured number of attempts/time, if the shard-server does > not > > > > come > > > > > up, then one controller-server can authoritatively mark it as down > > and > > > > > update the layout. > > > > > > > > > > -- > > > > > Ravi > > > > > > > > > > > > > > > > > > > >
