[jira] Commented: (HBASE-1295) Federated HBase

Andrew Purtell (JIRA) Mon, 11 May 2009 08:52:08 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708085#action_12708085
 ]


Andrew Purtell commented on HBASE-1295:
---------------------------------------

@Billy:

I don't follow what you are saying about splits. It won't matter where a table 
is split. The replication process does not care about such details. It would 
send edits from the WALs to peers to be applied as if some local HRS is 
receiving local batchupdates. 

Also, according to the proposal, the master would not be involved in 
replication. The proposal considers more than one HRS -- self-elected via ZK -- 
working in a fault tolerant way to forward edits sent by all the other HRS on 
to the peer cluster. There is no SPOF in the replication process.

Also, I disagree that waiting for HLog roll is the wrong way to go. There is no 
reason a log roll cannot happen once per minute or every five minutes or 
whatever the configured replication period is. Then, we do not care if append 
or sync is properly implemented in the underlying FS. Given the state of how 
those issues are progressing in HDFS, we may have a working replication process 
before HDFS has a working append. 

> Federated HBase
> ---------------
>
>                 Key: HBASE-1295
>                 URL: https://issues.apache.org/jira/browse/HBASE-1295
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>         Attachments: hbase_repl.2.odp, hbase_repl.2.pdf
>
>
> HBase should consider supporting a federated deployment where someone might 
> have terascale (or beyond) clusters in more than one geography and would want 
> the system to handle replication between the clusters/regions. It would be 
> sweet if HBase had something on the roadmap to sync between replicas out of 
> the box. 
> Consider if rows, columns, or even cells could be scoped: local, or global.
> Then, consider a background task on each cluster that replicates new globally 
> scoped edits to peer clusters. The HBase/Bigtable data model has convenient 
> features (timestamps, multiversioning) such that simple exchange of globally 
> scoped cells would be conflict free and would "just work". Implementation 
> effort here would be in producing an efficient mechanism for collecting up 
> edits from all the HRS and transmitting the edits over the network to peers 
> where they would then be split out to the HRS there. Holding on to the edit 
> trace and tracking it until the remote commits succeed would also be 
> necessary. So, HLog is probably the right place to set up the tee. This would 
> be filtered log shipping, basically.  
> This proposal does not consider transactional tables. For transactional 
> tables, enforcement of global mutation commit ordering would come into the 
> picture if the user  wants the  transaction to span the federation. This 
> should be an optional feature even with transactional tables themselves being 
> optional because of how slow it would be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1295) Federated HBase

Reply via email to