Re: HA namenode questions

Eric Newton Fri, 14 Mar 2014 13:52:31 -0700

Thanks Mike (and Todd), that clears things up.  I was not aware that the
zookeeper locks were held by a separate process (ZKFC).


-Eric



On Fri, Mar 14, 2014 at 4:24 PM, Mike Drob <[email protected]> wrote:

> Replies from Todd Lipcon in-line.
>
> Mike
>
> ---------- Forwarded message ----------
> From: Todd Lipcon <[email protected]>
> Date: Fri, Mar 14, 2014 at 4:14 PM
> Subject: Re: HA namenode questions
>
>
> I'm not on dev@accumulo upstream list anymore, but here's an answer. Feel
> free to forward onto the public list (I've known Eric for a while)
>
>
> ---------- Forwarded message ----------
> > From: Eric Newton <[email protected]>
> > Date: Fri, Mar 14, 2014 at 3:18 PM
> > Subject: HA namenode questions
> > To: [email protected]
> >
> >
> > For those of you running HA NN on large clusters, I'm looking for some
> > advice.
> >
> > I was looking at an HA NN config today.  Either by default, or by
> following
> > the configuration instructions, I saw that the zookeeper timeout was set
> to
> > 5 seconds.
> >
> > * is this a reasonable timeout?
> >
> >
> Yes -- this timeout is only used from the ZKFC process, which is a very
> lightweight process whose _only_ jobs are to (a) ping ZK, and (b) ping the
> NN to check its health. It has on the order of a few MB of heap usage, so
> should never GC. If it goes away longer than 5 seconds something is almost
> certainly wrong with the machine or network.
>
> That said, if you would rather ride out a longer network blip (eg a switch
> reboot) you could choose to make it longer.
>
>
> >  * do you provide HA NN its own set of zookeepers?
> >
> >
> So long as the ZKs aren't ridiculously overloaded, sharing should be fine.
> If you have a lot of un-tamed clients to some other ZK cluster, it's
> probably best from an isolation perspective to run your own ensemble for HA
> purposes. But, the ZK daemons could be colocated on the NNs + JT for
> example so long as they get dedicated spindles.
>
>
> >  We have seen problems with large GC pauses with tablet servers.  This
> > happens less and less as we have learned more tricks, but I'm constantly
> > talking to users who want their zookeeper timeout as high as two minutes.
> >
> > Yea, the ZKFC has no heap usage, so no GC.
>
>
> >  We have also had to increase the number of zookeepers on our largest
> > clusters in order to handle the "thundering herd" load when large
> > map/reduce jobs kick off and they all start talking to accumulo, which
> > requires reading information from zookeeper.
> >
> > Clients today in HDFS HA don't ever talk to ZK, so the number of nodes
> accessing ZK is limited to just the two NNs.
>
> >  Any experience you can share about HA NN configuration at scales over
> few
> > hundred nodes would be appreciated.
> >
> > The ZK interaction should have no dependence on cluster size. The timeout
> for how long it is expected to become active can have a dependence on
> number of blocks in the cluster, but you should be able to see that by
> doing some "practice failovers". We're working on making the
> transitionToActive process quicker and more constant-time rather than
> dependent on initializing block replication queues inline with the
> failover.
>

Re: HA namenode questions

Reply via email to