Thanks Andy,

This is in lines with what I was asking/thinking of.

To separate concerns a bit:
The question is if HRegionSever instead of exiting could have a standby
state (maintaining the current region state would be optional depending
whether we have shadow regions or not) where it doesn¹t serve requests,
but still visible to the cluster. This way we could ³wake it up² if we
decide that is safe (e.g. we remain consistent).

This scenarios seems to solve both my laptop standby problem as well as
(potentially fast) recovery after a network partition.

Cosmin



On 6/3/14, 3:42 PM, "Andrew Purtell" <[email protected]> wrote:

>Although a dev lapop suspend and resume is equivalent to a total network
>failure for the elapsed time, with all that implies, perhaps after
>HBASE-10070 goes in it could be possible for a RS to stay up with regions
>switched to read only replica mode, reconnect, and perhaps the master upon
>discovering all regions still available on the cluster but in read only
>mode (no primary), it could reassign primaries - the net effect being
>indeed no process needs restarting by a supervisor.
>
>
>On Tue, Jun 3, 2014 at 3:27 PM, Andrew Purtell <[email protected]>
>wrote:
>
>> Stating the obvious you just need to restart the RegionServer because it
>> shut down. We use ZooKeeper for tracking server liveness and from the
>> ZooKeeper perspective a sufficiently long time elapsed without heartbeat
>> such that the RegionServer's session expired. We've left this option to
>> date to the user to do with supervisory scripts, e.g. Puppet / Chef /
>> Daemontools. I suppose a RegionServer could try and reinitialize as a
>>new
>> process or the ./bin/hbase script could do this if you ask.
>>
>>
>> On Tue, Jun 3, 2014 at 2:46 PM, Cosmin Lehene <[email protected]> wrote:
>>
>>> I just realized that, for years, I've been countlessly restarted hbase
>>> every time my laptop gets out of standby.
>>> I know well why I do this, but I also know I could probably not do it
>>>  and that I don't have to do with Hadoop or Zookeeper or other
>>>services and
>>> I wish I wouldn't need to with Hbase either.
>>>
>>> So short term, I'd like to know if there a better way already.
>>>
>>> Long term I think this is a bigger, more fundamental resiliency aspect
>>> that perhaps is not trivial, but probably worth thinking about in the
>>>real
>>> deployments context and I wonder if there's something that tries to
>>>solve
>>> this already.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Cosmin
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>
>
>
>-- 
>Best regards,
>
>   - Andy
>
>Problems worthy of attack prove their worth by hitting back. - Piet Hein
>(via Tom White)

Reply via email to