[jira] Commented: (HBASE-1295) Multi data center replication

Andrew Purtell (JIRA) Thu, 25 Jun 2009 14:58:34 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724315#action_12724315
 ]


Andrew Purtell commented on HBASE-1295:
---------------------------------------

bq. I assume to get a new peer online we would have to have some kind of a 
read-lock with a flush or some kind of catchup mode that will export all data 
before x timestamp 

In my opinion, no. Edits are propagated from HLogs. Existing data would not be 
replicated, therefore. It would be an application specific consideration, and 
could be accomplished perhaps of forwarding of existing data at the application 
level via mapreduce transfer job or a background value fetch-and-refresh 
strategy.

bq. Also will this be a write one place read anywhere replication or write 
anywhere read anywhere replication

Write anywhere read anywhere

bq. Will edits be able to write to any site/cluster and get replication to all 
the peers?

Yes.
But I know that jgray at least would like for some administrative control over 
the propagation details. This could be accomplished via local settings stored 
in each cluster's peer table, supplied as parameters to ADD PEER commands.


> Multi data center replication
> -----------------------------
>
>                 Key: HBASE-1295
>                 URL: https://issues.apache.org/jira/browse/HBASE-1295
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: hbase_repl.3.odp, hbase_repl.3.pdf
>
>
> HBase should consider supporting a federated deployment where someone might 
> have terascale (or beyond) clusters in more than one geography and would want 
> the system to handle replication between the clusters/regions. It would be 
> sweet if HBase had something on the roadmap to sync between replicas out of 
> the box. 
> Consider if rows, columns, or even cells could be scoped: local, or global.
> Then, consider a background task on each cluster that replicates new globally 
> scoped edits to peer clusters. The HBase/Bigtable data model has convenient 
> features (timestamps, multiversioning) such that simple exchange of globally 
> scoped cells would be conflict free and would "just work". Implementation 
> effort here would be in producing an efficient mechanism for collecting up 
> edits from all the HRS and transmitting the edits over the network to peers 
> where they would then be split out to the HRS there. Holding on to the edit 
> trace and tracking it until the remote commits succeed would also be 
> necessary. So, HLog is probably the right place to set up the tee. This would 
> be filtered log shipping, basically.  
> This proposal does not consider transactional tables. For transactional 
> tables, enforcement of global mutation commit ordering would come into the 
> picture if the user  wants the  transaction to span the federation. This 
> should be an optional feature even with transactional tables themselves being 
> optional because of how slow it would be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1295) Multi data center replication

Reply via email to