Re: ClusterConnection package private

Enis Söztutar Fri, 14 Mar 2014 17:32:25 -0700

I was in favor of co-locating, because we had the "meta is one region" for
so long, our regions are big, and we did not spend much time on master
redesign. However, in an ideal case, we should be going with the splittable
meta design from BT, and shoot for regions being sized around hdfs block
size (128 / 512M) and having millions of regions.The reason we currently
get away with single meta region is that, our regions can be 10-20GB, so
100K regions would be enough to address 1-2 PB data. It seems clear that we
do not want two state machines, one in master, and one in meta per region
which can diverge and make AM the hell that it is today. One way to ease
this is to move meta into master and ensure master in-memory == meta. The
other way would be to make master stateless and meta the only authoritative
source. I would vote for the latter.


Coming to the ClusterConnection, I thought that CoprocessorHConnection is
kind of similar. It should be fine to have an in-process ClusterConnection
implementation.

Enis


On Fri, Mar 14, 2014 at 3:23 PM, Nick Dimiduk <[email protected]> wrote:

> Taking advantage of region replicas will require the indirection and
> potential network hop. Could be a "short-circuit" local read optimization
> is possible, but I don't think it worth it for scanning meta.
>
> On Friday, March 14, 2014, Stack <[email protected]> wrote:
>
> > On Fri, Mar 14, 2014 at 1:22 PM, Jimmy Xiang <[email protected]
> <javascript:;>>
> > wrote:
> >
> > > That means there will be many small meta regions. If we just have one
> > > instance of each region, that should help. But we are moving towards HA
> > > regions, right?
> > >
> > >
> > Even if the region is 'HA', there will be an indirection.
> >
> > So question stands, should we do this direct route at all?  There is a
> big
> > advantage?
> >
> > St.Ack
> >
>

Re: ClusterConnection package private

Reply via email to