Re: RE: HBASE-2312 discussion

Todd Lipcon Wed, 17 Mar 2010 15:00:42 -0700

Hi Ryan,

I think the idea of a secondary watchdog node is a decent one, but as you
mentioned, it isn't a solution for the problem at hand. The RC pause
exacerbates the problem, but network blips, etc, can cause the same problem.


Is there a JIRA open for the watchdog process? I think we should discuss it
separately. A few weeks I had proposed on IRC the ridiculously named
"SeppukuNode" which is a similar but not quite the same idea - we should
hash those out on JIRA.

-Todd

On Wed, Mar 17, 2010 at 11:38 AM, Ryan Rawson <ryano...@gmail.com> wrote:

> There are 2 ways to lose your ZK session:
>
> - you dont send pings back to ZK and it expires it (GC pause of death,
> network disconnect, etc)
> - ZK "somehow" expires your session for you. I have seen this once in
> a while, its rare, but painful when it happens. It didn't seem to be
> correlated to GC pause at the time.
>
> So here is the proposal in full:
> - RegionServerWatcher starts the ZK pingback, and exists to listen for
> termination notifications from RegionServer (via good old fashioned OS
> primitives).
> - RSW keeps the ZK node up. Keeps tabs on it's child, perhaps checking
> ports, or whatnot.
> - If RS dies, RSW kills the ZK emphermial node. No race conditions
> because the log append terminates before the master takes action
> (which it does only after the ZK notification comes in).
> - If a RS goes into a long GC pause, the RSW can decide to wait it out
> or kill -9 the RS and release the HLog. Again no race condition for
> the previous reason.
> - If a network outage takes the node out, this is where a race
> condition could occur.  In which case, Option #1 seems super clean and
> awesome. It also has the advantage of being really easy to understand
> (always a plus at 2am).
>
> The overall advantage of my proposal is we can tune down the ZK
> timeout to something really small.  Like 10 seconds. That way when
> network events take a node out of service, we can detect and respond
> much faster.  Also with a separate process we now have the ability to
> react instantly to crashes without waiting for a timeout. A
> disadvantage is more moving parts, but we can probably abstract this
> away cleanly.
>
> One last thought - if we have a 10 second timeout and we have a
> network partition, we will see a cascade of failed regionservers.
> Considering that the individual RS may not be able to proceed anyways
> (they might have been cut off from too many datanodes to log or read
> hfiles), this might be inevitable.  Obviously this means running HBase
> across a WAN is right out (we always knew that, right?), but this is
> why we are doing replication.
>
> On Wed, Mar 17, 2010 at 10:55 AM, Todd Lipcon <t...@cloudera.com> wrote:
> > On Wed, Mar 17, 2010 at 10:48 AM, Ryan Rawson <ryano...@gmail.com>
> wrote:
> >
> >> I have a 4th option :-)  I'm on the his right now and ill write it up
> when
> >> I
> >> get to work. In short move the zk thread out of the rs into a monitoring
> >> parent and then you can explicitly monitor for Juliet gc pauses. More to
> >> come....
> >>
> >
> > I don't think that will be correct - it might be mostly correct, but
> "Juliet
> > gc pauses" are just an extra long version of what happens all the time.
> ZK
> > is asynchronous, so we will never find out immediately if we've been
> killed.
> > There can always be an arbitrarily long pause in between looking at ZK
> state
> > and taking an action.
> >
> > -Todd
> >
> >
> >>
> >> On Mar 17, 2010 10:22 AM, "Karthik Ranganathan" <
> kranganat...@facebook.com
> >> >
> >> wrote:
> >>
> >> Loved the "Juliet" terminology as well :).
> >>
> >> @Todd: I agree we will need something like #2 or especially #3 in other
> >> places.
> >>
> >> Looks like we have a consensus - I will update the JIRA.
> >>
> >>
> >> Thanks
> >> Karthik
> >>
> >>
> >> -----Original Message-----
> >> From: Todd Lipcon [mailto:t...@cloudera.com]
> >>
> >> Sent: Tuesday, March 16, 2010 10:09 PM
> >> To: hbase-dev@hadoop.apache.org
> >> Subject: Re: HBASE-2312 discu...
> >>
> >> On Tue, Mar 16, 2010 at 8:59 PM, Stack <st...@duboce.net> wrote:
> >>
> >> > On Tue, Mar 16, 2010 at 5:08 PM,...
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: RE: HBASE-2312 discussion

Reply via email to