We could use ZooKeeper paths as a way for replication endpoints to know about each other.
On Fri, Apr 24, 2009 at 8:16 PM, Andrew Purtell (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/HBASE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702647#action_12702647] > > Andrew Purtell commented on HBASE-1295: > --------------------------------------- > > bq. In the case a cluster is down, I guess that would mean the other > clusters would have to keep all the WALs until the it is up again. At that > moment, it may receive tons of WALs right? > > Yes the effect of a partition and extended outage is a buildup of WALs on > the peer clusters, and then a lot of backlog. Let me think about this case > and post a revised slide deck. > > bq. Also I was wondering, if you want to add a new cluster, would the way > to go be replicating by hand (MR or else) all the data to the other cluster > then telling somehow that the clusters have a new peer? > > I was anticipating that the cluster would be advertised as a peer, somehow, > and then replication would then start. The replicators should add tables and > column families to their local schema on demand as the cells are received, > maybe additionally also ask the peer about column family details as > necessary. Whether or not to bring over existing data would be a > deployment/application concern I think and could be handed by a MR > export-import job. > > > Federated HBase > > --------------- > > > > Key: HBASE-1295 > > URL: https://issues.apache.org/jira/browse/HBASE-1295 > > Project: Hadoop HBase > > Issue Type: New Feature > > Reporter: Andrew Purtell > > Attachments: hbase_repl.1.pdf > > > > > > HBase should consider supporting a federated deployment where someone > might have terascale (or beyond) clusters in more than one geography and > would want the system to handle replication between the clusters/regions. It > would be sweet if HBase had something on the roadmap to sync between > replicas out of the box. > > Consider if rows, columns, or even cells could be scoped: local, or > global. > > Then, consider a background task on each cluster that replicates new > globally scoped edits to peer clusters. The HBase/Bigtable data model has > convenient features (timestamps, multiversioning) such that simple exchange > of globally scoped cells would be conflict free and would "just work". > Implementation effort here would be in producing an efficient mechanism for > collecting up edits from all the HRS and transmitting the edits over the > network to peers where they would then be split out to the HRS there. > Holding on to the edit trace and tracking it until the remote commits > succeed would also be necessary. So, HLog is probably the right place to set > up the tee. This would be filtered log shipping, basically. > > This proposal does not consider transactional tables. For transactional > tables, enforcement of global mutation commit ordering would come into the > picture if the user wants the transaction to span the federation. This > should be an optional feature even with transactional tables themselves > being optional because of how slow it would be. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
