On Wed, Jan 15, 2014 at 1:47 PM, Enis Söztutar <enis....@gmail.com> wrote:
> > > > > > I am late to the game so take my comments w/ a grain of salt -- I'll > > > take a > > > > look at HBASE-10070 -- but high-level do we have to go the read > > replicas > > > > route? IMO, having our current already-strained AssignmentManager > code > > > > base manage three replicas instead of one will ensure that Jimmy > Xiang > > > and > > > > Jeffrey Zhong do nothing else for the next year or two but work on > the > > > new > > > > interesting use cases introduced by this new level of complexity put > > > upon a > > > > system that has just achieved a hard-won stability. > > > > > > > > > > Stack, the model is that the replicas (HRegionInfo with an added field > > > 'replicaId') are treated just as any other region in the AM. You can > > > see the code - it's not adding much at all in terms of new code to > > > handle replicas. > > > > > > > > > > Adding to what Devaraj said, we opted for actually creating one more > HRegionInfo object > per region per replica count so that the assignment state machine is not > affected. The high level > change is that we are creating replica x num regions many regions, and > assign them. The LB > ensures that replica's are placed with high availability across hosts and > racks. > > Ok. Then it is about the same amount of work in either case. The LB is to be altered to factor namespaces. This replicas work seems equivalent only along another dimension (can the dimensions be joined so we get namespace-aware balancing when you addreplica-aware balancing is added?) > However, with different tables, it will be unintuitive > since the meta, and the > client side would have to bring different regions of different tables to > make sense. Those tables > will not have any associated data, but refer to the other tables etc. > > That is right. HBase core would be go untouched. The read replica 'construct' would be an imposition done in a layer above. > Trying to minimize the new code getting to the objective. > > > > > > > I think these should be addressed by region changes section in the design > doc. In region-snapshots > section, we detail how this will be like single-region snapshots. We do not > need table snapshots per se, > since we are opening the region replica from the files of the primary. > There is already a working patch for this > in the branch. In async-wal replication section, we mention how this can be > build using the existing replication > mechanism. We cannot directly replicate to a different table since we do > not want to multiply the actual data in hdfs. > But we will tap into the replica sink to do the in-cluster replication. > > OK. > > Quorum read/writes as in paxos, raft (Liyin talked about the Facebook > > Hydrabase project at his keynote at hbasecon last year). > > > > That won't happen without a major architecture surgery in HBase. > HBASE-10070 is some > major work, but is in no way a major arch change I would say. Hydrabase / > megastore is also > across DC, while we are mostly interested in intra-DC availability right > now. > > Timeline is one of the questions I have up on HBASE-10070. Thanks E, St.Ack