Thanks Andy, This is in lines with what I was asking/thinking of.
To separate concerns a bit: The question is if HRegionSever instead of exiting could have a standby state (maintaining the current region state would be optional depending whether we have shadow regions or not) where it doesn¹t serve requests, but still visible to the cluster. This way we could ³wake it up² if we decide that is safe (e.g. we remain consistent). This scenarios seems to solve both my laptop standby problem as well as (potentially fast) recovery after a network partition. Cosmin On 6/3/14, 3:42 PM, "Andrew Purtell" <[email protected]> wrote: >Although a dev lapop suspend and resume is equivalent to a total network >failure for the elapsed time, with all that implies, perhaps after >HBASE-10070 goes in it could be possible for a RS to stay up with regions >switched to read only replica mode, reconnect, and perhaps the master upon >discovering all regions still available on the cluster but in read only >mode (no primary), it could reassign primaries - the net effect being >indeed no process needs restarting by a supervisor. > > >On Tue, Jun 3, 2014 at 3:27 PM, Andrew Purtell <[email protected]> >wrote: > >> Stating the obvious you just need to restart the RegionServer because it >> shut down. We use ZooKeeper for tracking server liveness and from the >> ZooKeeper perspective a sufficiently long time elapsed without heartbeat >> such that the RegionServer's session expired. We've left this option to >> date to the user to do with supervisory scripts, e.g. Puppet / Chef / >> Daemontools. I suppose a RegionServer could try and reinitialize as a >>new >> process or the ./bin/hbase script could do this if you ask. >> >> >> On Tue, Jun 3, 2014 at 2:46 PM, Cosmin Lehene <[email protected]> wrote: >> >>> I just realized that, for years, I've been countlessly restarted hbase >>> every time my laptop gets out of standby. >>> I know well why I do this, but I also know I could probably not do it >>> and that I don't have to do with Hadoop or Zookeeper or other >>>services and >>> I wish I wouldn't need to with Hbase either. >>> >>> So short term, I'd like to know if there a better way already. >>> >>> Long term I think this is a bigger, more fundamental resiliency aspect >>> that perhaps is not trivial, but probably worth thinking about in the >>>real >>> deployments context and I wonder if there's something that tries to >>>solve >>> this already. >>> >>> Thoughts? >>> >>> Thanks, >>> Cosmin >>> >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> > > > >-- >Best regards, > > - Andy > >Problems worthy of attack prove their worth by hitting back. - Piet Hein >(via Tom White)
