RE: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

Todd Greenwood Thu, 06 Aug 2009 15:51:19 -0700

Considering that we're opting for a WAN deployment that is not going to
use groups, weights, etc., and that we are going to implement an
ensemble-to-ensemble sync mechanism...what version of zookeeper do you
recommend?


> -----Original Message-----
> From: Todd Greenwood
> Sent: Wednesday, August 05, 2009 2:21 PM
> To: 'zookeeper-dev@hadoop.apache.org'
> Subject: RE: Optimized WAN ZooKeeper Config : Multi-Ensemble
configuration
> 
> Mahadev, comments inline:
> 
> > -----Original Message-----
> > From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> > Sent: Wednesday, August 05, 2009 1:47 PM
> > To: zookeeper-dev@hadoop.apache.org
> > Subject: Re: Optimized WAN ZooKeeper Config : Multi-Ensemble
> configuration
> >
> > Todd,
> >  Comments in line:
> >
> >
> > On 8/5/09 12:10 PM, "Todd Greenwood" <to...@audiencescience.com>
wrote:
> >
> > > Flavio/Patrick/Mahadev -
> > >
> > > Thanks for your support to date. As I understand it, the sticky
points
> > > w/ respect to WAN deployments are:
> > >
> > > 1. Leader Election:
> > >
> > > Leader elections in the WAN config (pod zk server weight = 0) is a
bit
> > > troublesome (ZOOKEEPER-498)
> > Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with
> groups
> > and zero weight.
> >
> > >
> > > 2. Network Connectivity Required:
> > >
> > > ZooKeeper clients cannot read/write to ZK Servers if the Server
does
> not
> > > have network connectivity to the quorum. In short, there is a hard
> > > requirement to have network connectivity in order for the clients
to
> > > access the shared memory graph in ZK.
> > Yes
> >
> > >
> > > Alternative
> > > -----------
> > >
> > > I have seen some discussion about in the past re: multi-ensemble
> > > solutions. Essentially, put one ensemble in each physical location
> > > (POD), and another in your DC, and have a fairly simple process
> > > coordinate synchronizing the various ensembles. If the POD writes
can
> be
> > > confined to a sub-tree in the master graph, then this should be
fairly
> > > simple. I'm imagining the following:
> > >
> > > DC (master) graph:
> > > /root/pods/1/data/item1
> > > /root/pods/1/data/item2
> > > /root/pods/1/data/item3
> > > /root/pods/2
> > > /root/pods/3
> > > ...etc
> > > /root/shared/allpods/readonly/data/item1
> > > /root/shared/allpods/readonly/data/item2
> > > ...etc
> > >
> > > This has the advantage of minimizing cross pod traffic, which
could be
> a
> > > real perf killer in an WAN. It also provides transacted writes in
the
> > > PODs, even in the disconnected state. Clearly, another portion of
the
> > > business logic has to reconcile the DC (master) graph such that
each
> of
> > > the pods data items are processed, etc.
> > >
> > > Does anyone have any experience with this (pitfalls, suggestions,
> etc.?)
> > As far as I understand is that you mean that have a master Cluster
with
> > other in a different data center syncing with the master (just a
> subtree)?
> > Is that correct?
> >
> > If yes, this is what one of our users in Yahoo! Search do. They have
a
> > master cluster and a smaller cluster in a different datacenter and a
> > brdige
> > that copies data from the master cluster (only a subtree) to the
smaller
> > one
> > and keeps them in syncs.
> >
> 
> Yes, this is exactly what I'm proposing. With the addition that I'll
sync
> subtrees in both directions, and have a separate process reconcile
data
> from the various pods, like so:
> 
> #pod1 ensemble
> /root/a/b
> 
> #pod2 ensemble
> /root/a/b
> 
> #dc ensemble
> /root/shared/foo/bar
> 
> # Mapping (modeled after perforce client config)
> # [ensemble]:[path] [ensemble]:[path]
> # sync pods to dc
> [POD1]:/root/... [DC]:/root/pods/POD1/...
> [POD2]:/root/... [DC]:/root/pods/POD2/...
> # sync dc to pods
> [DC]:/root/shared/... [POD1]:/shared/...
> [DC]:/root/shared/... [POD2]:/shared/...
> [DC]:/root/shared/... [POD3]:/shared/...
> 
> Now, for our needs, we'd like the DC data aggregated, so I'll have
another
> process handle aggregating the pod specific data like so:
> 
> POD Data Aggregator: aggregate data in [DC]:/root/pods/POD(N) to
> [DC]:/root/aggregated/data.
> 
> This is just off the top of my head.
> 
> -Todd
> 
> >
> > Thanks
> > mahadev
> > >
> > > -Todd

RE: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

Reply via email to