We migrated about 30% of our data in Blur. But our servers were struggling to index the remaining 60% of data coupled with new incoming documents for already migrated 30% user-base...
We wanted to isolate migration (data-pumping) from normal indexing/search requests.. Blur made it very easy for me. In MasterBasedFactory, I reserve a shard-range for migration & make sure only a certain set of machines serve it. Once the shards reach a desired size(15-20GB) , I de-reserve it (Normal machines pick it up & start serving incoming requests immediately). Newer reserved shards are allocated to continue our data pumping un-interrupted. This was a big problem for us & the hidden gems of Blur continue to suprise & help us so much. Thanks a lot... -- Ravi On Wed, Mar 19, 2014 at 12:07 PM, Ravikumar Govindarajan < [email protected]> wrote: > Sure will take up 0.2.2 codebase. Thanks for all your help > > > On Tue, Mar 18, 2014 at 4:24 PM, Aaron McCurry <[email protected]> wrote: > >> On Tue, Mar 18, 2014 at 1:30 AM, Ravikumar Govindarajan < >> [email protected]> wrote: >> >> > > >> > > If I understand this one. Favor the primary response until a certain >> > > amount of time has passed then fall back to the secondary response >> > assuming >> > > it's available to return. >> > >> > >> > Exactly. This is one such option. Another option is the >> first-past-the-post >> > >> > Buffer cache? Are you referring to block cache? >> > >> > >> > Yup. Was referring to the block-cache here. But like you said, we can >> just >> > let it fall off the LRU >> > >> > The interesting thing here is that Blur is fully committed to disk >> (HDFS) >> > >> > upon each mutate >> > >> > I think this is a new feature that I have missed in Blur. Will for sure >> > check it out. This auto-solves the stale-read issue also >> > >> > The problem now is, I am doing quite low-level changes on top of blur. >> Some >> > of them are.. >> > >> > 1. Online Shard-Creation >> > 2. Externalizing RowId->Shard mapping via BlurPartitioner >> > 3. Splitting shards upon reaching configured size >> > 4. Secondary read-only shard for availability... >> > >> >> I would love hear about more of the details of the implementations of >> these. :-) >> >> >> > >> > and many more such stuff needed for our app >> > >> > Hope to share and get feedback for these changes from Blur community >> once >> > the system survives a couple of production-cycles. >> > >> >> That would be awesome. Based on your other email, I would strongly >> recommend you take a look at the 0.2.2 codebase. It has MANY fixes, >> performance improvements, and stability enhancements. Let us know if you >> have any questions. >> >> Aaron >> >> >> > >> > -- >> > Ravi >> > >> > >> > On Mon, Mar 17, 2014 at 7:17 PM, Aaron McCurry <[email protected]> >> wrote: >> > >> > > On Sat, Mar 15, 2014 at 12:57 PM, Ravikumar Govindarajan < >> > > [email protected]> wrote: >> > > >> > > > Aaron, >> > > > >> > > > I was thinking about another way of utilizing read-only shards >> > > > >> > > > Instead of logic/intelligence of finding a primary replica >> > > struggling/down, >> > > > can we opt for pushing a logic on client-side? >> > > > >> > > > We can take a few approaches as below >> > > > >> > > > 1. Query both primary/secondary shards in parallel and return which >> > ever >> > > > comes first >> > > >> > > >> > > > 2. Query both primary/secondary shards in parallel. Wait for primary >> > > > response as per configured delay. If not forthcoming, return >> > secondary's >> > > > response >> > > > >> > > >> > > If I understand this one. Favor the primary response until a certain >> > > amount of time has passed then fall back to the secondary response >> > assuming >> > > it's available to return. >> > > >> > > >> > > > >> > > > These are useful only when client agrees for a "stale-read" >> scenario. >> > > > "stale-read" in this case will be the last-commit of the index. >> > > > >> > > > What I am aiming at, is in the case of layout-conscious apps [layout >> > does >> > > > not change when VM update/crash/hang is restarted], we can always >> > > fall-back >> > > > on replica reads, resulting in greater availability but lesser >> > > consistency >> > > > >> > > > A secondary-replica layout need to be present in ZK. Replica-shards >> > > should >> > > > be always served from a server other than primary. May be we can >> > > switch-off >> > > > buffer-cache for replica reads, as it is used only temporarily >> > > > >> > > >> > > Buffer cache? Are you referring to block cache? Or a query cache? >> Just >> > > as a FYI, Blur's query cache is currently disabled. As for the block >> > > cache, maybe. The block cache seems to help performance quite a bit >> and >> > > usually is does so at little cost. Also, we could flush the secondary >> > > shard from the cache from time to time. Or we could just let it fall >> out >> > > of the LRU. >> > > >> > > >> > > > >> > > > 95% apps queue their indexing operations and can always retry after >> > > primary >> > > > comes back online. >> > > > >> > > >> > > The interesting thing here is that Blur is fully committed to disk >> (HDFS) >> > > upon each mutate. So assuming that the secondary shard has refreshed, >> > the >> > > primary shard being down just means that you can't write to that >> shard. >> > > Reads should be in the same state. >> > > >> > > >> > > > >> > > > Please let me know your views on this >> > > > >> > > >> > > I like all these ideas, the only thing I would add is that we we would >> > need >> > > to build these sort of options into Blur on a configured per-table >> basis. >> > > The querying both primary and secondary shards at the same time could >> > > produce the most consistent respond times but at the cost of CPU >> > resources >> > > (obviously). >> > > >> > > Thanks for the thoughts and ideas! I like it! >> > > >> > > Aaron >> > > >> > > >> > > > >> > > > -- >> > > > Ravi >> > > > >> > > > >> > > > On Sat, Mar 8, 2014 at 8:56 PM, Aaron McCurry <[email protected]> >> > > wrote: >> > > > >> > > > > On Fri, Mar 7, 2014 at 5:42 AM, Ravikumar Govindarajan < >> > > > > [email protected]> wrote: >> > > > > >> > > > > > > >> > > > > > > Well it works that way for OOMs and for when the process drop >> > hard >> > > > > (Think >> > > > > > > kill -9). However when a shard server is shutdown it >> currently >> > > ends >> > > > > it's >> > > > > > > session in ZooKeeper, thus triggering a layout change. >> > > > > > >> > > > > > >> > > > > > Yes, may be we can have a config to determine whether it shud >> > > > > end/maintain >> > > > > > the session in ZK when doing a normal shutdown and then >> subsequent >> > > > > restart. >> > > > > > By this way, both MTTR-conscious and layout-conscious settings >> can >> > be >> > > > > > supported. >> > > > > > >> > > > > >> > > > > That's a neat idea. Once we have shards being served on multiple >> > > servers >> > > > > we should definitely take a look at this. When we implement the >> > > > > multi-shard serving I would guess that there will be 2 layout >> > > strategies >> > > > > (they might be implemented together). >> > > > > >> > > > > 1. Would be to get the N replicas online on different servers. >> > > > > 2. Would the writing leader for the shard, assuming that it's >> needed. >> > > > > >> > > > > >> > > > > > >> > > > > > How do you think we can detect that a particular shard-server is >> > > > > > struggling/shut-down and hence incoming search-requests need to >> go >> > to >> > > > > some >> > > > > > other server? >> > > > > > >> > > > > > I am listing few paths off the top of my head >> > > > > > >> > > > > > 1. Process baby-sitters like supervisord, alerting controllers >> > > > > > 2. Tracking first network-exception in controller and diverting >> to >> > > > > > read-only >> > > > > > instance. Periodically may be re-try >> > > > > > 3. Take a statistics based decision, based on previous response >> > times >> > > > > etc.. >> > > > > > >> > > > > >> > > > > Anding to this one and this may be obvious but measuring the >> response >> > > > time >> > > > > in comparison with other shards. Meaning if the entire cluster is >> > > > > experiencing an increase in load and all responses times are >> > increasing >> > > > we >> > > > > wouldn't want to start killing off shard servers inadvertently. >> > > Looking >> > > > > for outliers. >> > > > > >> > > > > >> > > > > > 4. Build some kind of leasing mechanism in ZK etc... >> > > > > > >> > > > > >> > > > > I think that all of these are good approaches. Likely to >> determine >> > > that >> > > > a >> > > > > node is misbehaving and should be killed/not used anymore we would >> > want >> > > > > multiple ways to measure that condition and then vote on the need >> > kick >> > > > out. >> > > > > >> > > > > >> > > > > Aaron >> > > > > >> > > > > > >> > > > > > -- >> > > > > > Ravi >> > > > > > >> > > > > > >> > > > > > On Fri, Mar 7, 2014 at 8:01 AM, Aaron McCurry < >> [email protected]> >> > > > > wrote: >> > > > > > >> > > > > > > On Thu, Mar 6, 2014 at 6:30 AM, Ravikumar Govindarajan < >> > > > > > > [email protected]> wrote: >> > > > > > > >> > > > > > > > I came to know about zk.session.timeout variable just now, >> > while >> > > > > > reading >> > > > > > > > more about this problem. >> > > > > > > > >> > > > > > > > This will only trigger dead-node notification after the >> > > configured >> > > > > > > timeout >> > > > > > > > exceeds. Setting it to 3-4 mins must be fine for OOMs and >> > > > > > > rolling-restarts. >> > > > > > > > >> > > > > > > >> > > > > > > Well it works that way for OOMs and for when the process drop >> > hard >> > > > > (Think >> > > > > > > kill -9). However when a shard server is shutdown it >> currently >> > > ends >> > > > > it's >> > > > > > > session in ZooKeeper, thus triggering a layout change. >> > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > Only extra stuff I am looking for, is to divert search calls >> > to a >> > > > > > > read-only >> > > > > > > > shard instance during this 3-4 mins time to avoid >> mini-outages >> > > > > > > > >> > > > > > > >> > > > > > > Yes, and I think that the controllers will automatically >> spread >> > the >> > > > > > queries >> > > > > > > across those servers that are online. The BlurClient class >> > already >> > > > > > takes a >> > > > > > > list of connection strings and treats all connections as >> equals. >> > > For >> > > > > > > example, it's current use is to provide the client with all >> the >> > > > > > controllers >> > > > > > > connection strings. Internally if any one of the controllers >> > goes >> > > > down >> > > > > > or >> > > > > > > has a network issue another controller is automatically >> retried >> > > > without >> > > > > > the >> > > > > > > user having to do anything. There is back off, ping, and >> pooling >> > > > logic >> > > > > > in >> > > > > > > the BlurClientManager that the BlurClient utilizes. >> > > > > > > >> > > > > > > Aaron >> > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > Ravi >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan < >> > > > > > > > [email protected]> wrote: >> > > > > > > > >> > > > > > > > > What do you think of giving an extra leeway for >> shard-server >> > > > > > failover >> > > > > > > > > cases? >> > > > > > > > > >> > > > > > > > > Ex: Whenever a shard-server process gets killed, the >> > > > > controller-node >> > > > > > > does >> > > > > > > > > not immediately update-layout, but rather mark it as a >> > suspect. >> > > > > > > > > >> > > > > > > > > When we have a read-only back-up of shard, searches can >> > > continue >> > > > > > > > > unhindered. Indexing during this time can be diverted to a >> > > queue, >> > > > > > which >> > > > > > > > > will store and retry-ops, when shard-server comes online >> > again. >> > > > > > > > > >> > > > > > > > > Over configured number of attempts/time, if the >> shard-server >> > > does >> > > > > not >> > > > > > > > come >> > > > > > > > > up, then one controller-server can authoritatively mark >> it as >> > > > down >> > > > > > and >> > > > > > > > > update the layout. >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Ravi >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
