[ 
https://issues.apache.org/jira/browse/HBASE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-8748.
----------------------------------
    Resolution: Won't Fix

Stale. Context is different now.

> Be able to accomodate zookeeper going away for a minute or two -- or more
> -------------------------------------------------------------------------
>
>                 Key: HBASE-8748
>                 URL: https://issues.apache.org/jira/browse/HBASE-8748
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Zookeeper
>            Reporter: Michael Stack
>            Priority: Major
>
> I was talking w/ Christophe Taton yesterday and he asked what happens if 
> zookeeper goes away for a minute or two -- say a network or ensemble hiccup 
> of some type -- then what happens?
> Unless the ensemble comes back inside the zk session timeout, the cluster 
> will go down.
> To my knowledge, zk has hiccuped a few times.  There was the bug where 
> sequence numbers rolled around the top causing the ensemble to blip (fixed in 
> a newer zk).  There was another event where <speculation>some combination of 
> a leader election and accumulated log files (>100k)</speculation> caused the 
> ensemble blip at SU.  
> At FB apparently the zk session is way up -- > 5minutes -- in case a 
> top-of-the-rack switch reboots partitioning the network separating nodes from 
> the zk ensemble and rather than rely on presence of ephemeral nodes, rather, 
> they depend on heartbeats to determine presence or not of a regionserver (w/ 
> some smarts so that if all members of a rack disappear at the same time, it 
> is not likely they all crashed at same time).
> I am stating the obvious I know but the base presumption that zk will just 
> always be there is lazy on our part and we should not be acting as though it 
> were.
> Marking this a brainstorming issue because will need a bit of 
> discussion/design undoing our current presumption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to