Re: [jira] Commented: (HBASE-1295) Federated HBase

Ryan Rawson Fri, 24 Apr 2009 20:23:05 -0700

We could use ZooKeeper paths as a way for replication endpoints to know
about each other.


On Fri, Apr 24, 2009 at 8:16 PM, Andrew Purtell (JIRA) <[email protected]>wrote:

>
>    [
> https://issues.apache.org/jira/browse/HBASE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702647#action_12702647]
>
> Andrew Purtell commented on HBASE-1295:
> ---------------------------------------
>
> bq. In the case a cluster is down, I guess that would mean the other
> clusters would have to keep all the WALs until the it is up again. At that
> moment, it may receive tons of WALs right?
>
> Yes the effect of a partition and extended outage is a buildup of WALs on
> the peer clusters, and then a lot of backlog. Let me think about this case
> and post a revised slide deck.
>
> bq. Also I was wondering, if you want to add a new cluster, would the way
> to go be replicating by hand (MR or else) all the data to the other cluster
> then telling somehow that the clusters have a new peer?
>
> I was anticipating that the cluster would be advertised as a peer, somehow,
> and then replication would then start. The replicators should add tables and
> column families to their local schema on demand as the cells are received,
> maybe additionally also ask the peer about column family details as
> necessary. Whether or not to bring over existing data would be a
> deployment/application concern I think and could be handed by a MR
> export-import job.
>
> > Federated HBase
> > ---------------
> >
> >                 Key: HBASE-1295
> >                 URL: https://issues.apache.org/jira/browse/HBASE-1295
> >             Project: Hadoop HBase
> >          Issue Type: New Feature
> >            Reporter: Andrew Purtell
> >         Attachments: hbase_repl.1.pdf
> >
> >
> > HBase should consider supporting a federated deployment where someone
> might have terascale (or beyond) clusters in more than one geography and
> would want the system to handle replication between the clusters/regions. It
> would be sweet if HBase had something on the roadmap to sync between
> replicas out of the box.
> > Consider if rows, columns, or even cells could be scoped: local, or
> global.
> > Then, consider a background task on each cluster that replicates new
> globally scoped edits to peer clusters. The HBase/Bigtable data model has
> convenient features (timestamps, multiversioning) such that simple exchange
> of globally scoped cells would be conflict free and would "just work".
> Implementation effort here would be in producing an efficient mechanism for
> collecting up edits from all the HRS and transmitting the edits over the
> network to peers where they would then be split out to the HRS there.
> Holding on to the edit trace and tracking it until the remote commits
> succeed would also be necessary. So, HLog is probably the right place to set
> up the tee. This would be filtered log shipping, basically.
> > This proposal does not consider transactional tables. For transactional
> tables, enforcement of global mutation commit ordering would come into the
> picture if the user  wants the  transaction to span the federation. This
> should be an optional feature even with transactional tables themselves
> being optional because of how slow it would be.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (HBASE-1295) Federated HBase

Reply via email to