I was in favor of co-locating, because we had the "meta is one region" for so long, our regions are big, and we did not spend much time on master redesign. However, in an ideal case, we should be going with the splittable meta design from BT, and shoot for regions being sized around hdfs block size (128 / 512M) and having millions of regions.The reason we currently get away with single meta region is that, our regions can be 10-20GB, so 100K regions would be enough to address 1-2 PB data. It seems clear that we do not want two state machines, one in master, and one in meta per region which can diverge and make AM the hell that it is today. One way to ease this is to move meta into master and ensure master in-memory == meta. The other way would be to make master stateless and meta the only authoritative source. I would vote for the latter.
Coming to the ClusterConnection, I thought that CoprocessorHConnection is kind of similar. It should be fine to have an in-process ClusterConnection implementation. Enis On Fri, Mar 14, 2014 at 3:23 PM, Nick Dimiduk <[email protected]> wrote: > Taking advantage of region replicas will require the indirection and > potential network hop. Could be a "short-circuit" local read optimization > is possible, but I don't think it worth it for scanning meta. > > On Friday, March 14, 2014, Stack <[email protected]> wrote: > > > On Fri, Mar 14, 2014 at 1:22 PM, Jimmy Xiang <[email protected] > <javascript:;>> > > wrote: > > > > > That means there will be many small meta regions. If we just have one > > > instance of each region, that should help. But we are moving towards HA > > > regions, right? > > > > > > > > Even if the region is 'HA', there will be an indirection. > > > > So question stands, should we do this direct route at all? There is a > big > > advantage? > > > > St.Ack > > >
