Mahadev, comments inline:

> -----Original Message-----
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Wednesday, August 05, 2009 1:47 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Optimized WAN ZooKeeper Config : Multi-Ensemble
configuration
> 
> Todd,
>  Comments in line:
> 
> 
> On 8/5/09 12:10 PM, "Todd Greenwood" <to...@audiencescience.com>
wrote:
> 
> > Flavio/Patrick/Mahadev -
> >
> > Thanks for your support to date. As I understand it, the sticky
points
> > w/ respect to WAN deployments are:
> >
> > 1. Leader Election:
> >
> > Leader elections in the WAN config (pod zk server weight = 0) is a
bit
> > troublesome (ZOOKEEPER-498)
> Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with
groups
> and zero weight.
> 
> >
> > 2. Network Connectivity Required:
> >
> > ZooKeeper clients cannot read/write to ZK Servers if the Server does
not
> > have network connectivity to the quorum. In short, there is a hard
> > requirement to have network connectivity in order for the clients to
> > access the shared memory graph in ZK.
> Yes
> 
> >
> > Alternative
> > -----------
> >
> > I have seen some discussion about in the past re: multi-ensemble
> > solutions. Essentially, put one ensemble in each physical location
> > (POD), and another in your DC, and have a fairly simple process
> > coordinate synchronizing the various ensembles. If the POD writes
can be
> > confined to a sub-tree in the master graph, then this should be
fairly
> > simple. I'm imagining the following:
> >
> > DC (master) graph:
> > /root/pods/1/data/item1
> > /root/pods/1/data/item2
> > /root/pods/1/data/item3
> > /root/pods/2
> > /root/pods/3
> > ...etc
> > /root/shared/allpods/readonly/data/item1
> > /root/shared/allpods/readonly/data/item2
> > ...etc
> >
> > This has the advantage of minimizing cross pod traffic, which could
be a
> > real perf killer in an WAN. It also provides transacted writes in
the
> > PODs, even in the disconnected state. Clearly, another portion of
the
> > business logic has to reconcile the DC (master) graph such that each
of
> > the pods data items are processed, etc.
> >
> > Does anyone have any experience with this (pitfalls, suggestions,
etc.?)
> As far as I understand is that you mean that have a master Cluster
with
> other in a different data center syncing with the master (just a
subtree)?
> Is that correct?
> 
> If yes, this is what one of our users in Yahoo! Search do. They have a
> master cluster and a smaller cluster in a different datacenter and a
> brdige
> that copies data from the master cluster (only a subtree) to the
smaller
> one
> and keeps them in syncs.
> 

Yes, this is exactly what I'm proposing. With the addition that I'll
sync subtrees in both directions, and have a separate process reconcile
data from the various pods, like so:

#pod1 ensemble
/root/a/b

#pod2 ensemble
/root/a/b

#dc ensemble
/root/shared/foo/bar

# Mapping (modeled after perforce client config)
# [ensemble]:[path] [ensemble]:[path]
# sync pods to dc
[POD1]:/root/... [DC]:/root/pods/POD1/...
[POD2]:/root/... [DC]:/root/pods/POD2/...
# sync dc to pods
[DC]:/root/shared/... [POD1]:/shared/...
[DC]:/root/shared/... [POD2]:/shared/...
[DC]:/root/shared/... [POD3]:/shared/...

Now, for our needs, we'd like the DC data aggregated, so I'll have
another process handle aggregating the pod specific data like so:

POD Data Aggregator: aggregate data in [DC]:/root/pods/POD(N) to
[DC]:/root/aggregated/data.

This is just off the top of my head.

-Todd

> 
> Thanks
> mahadev
> >
> > -Todd

Reply via email to