[ 
https://issues.apache.org/jira/browse/HBASE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707924#action_12707924
 ] 

Billy Pearson commented on HBASE-1295:
--------------------------------------

I was thanking on this there is some other thing to consider like table splits 
will the regions be the same on both because there is no guarantee the 
compactions will happen at the same time or the split will find the same mid 
key.

I would thank the master would be the idea process to pull logs a pass to peer 
master then it can split the logs in to regions and pass the edits on to the 
servers hosting the regions.
I would like to see Sequential process of the edits to the peer so everything 
is in the same order and that's the way we store the wal's now.

I am not sure what the current status of appends on hdfs right now but if we 
had that 100% working the master could just remember where in the wal it read 
up to and pull every x secs to see if there are any updates then we would not 
have to worry about waiting for a log to roll which could be a while in some 
cases. Waiting for a log to roll for the updates to get pushed to the peers 
seams like the wrong way to go with this but might be the only way we have now 
if append is not working right in hdfs.

As for a first sync for the peers would be hugh saving if we could do a rolling 
read only mode on the regions and flush the memcache and copy the needed files 
unlock the region and start the transfer to the peer this would allow one by 
one copy of the regions to the remote and  it would only be depending on the 
site-site bandwidth as the bottleneck in the mean time the peer could be 
holding edits and waiting for all regions to get copied and then start the 
replay of the logs skipping any edit that is older the the time stamp of the 
copy. I thank that could be written in the hfile now I thank as meta data.

Just some suggestions and/or other thoughts



> Federated HBase
> ---------------
>
>                 Key: HBASE-1295
>                 URL: https://issues.apache.org/jira/browse/HBASE-1295
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>         Attachments: hbase_repl.2.odp, hbase_repl.2.pdf
>
>
> HBase should consider supporting a federated deployment where someone might 
> have terascale (or beyond) clusters in more than one geography and would want 
> the system to handle replication between the clusters/regions. It would be 
> sweet if HBase had something on the roadmap to sync between replicas out of 
> the box. 
> Consider if rows, columns, or even cells could be scoped: local, or global.
> Then, consider a background task on each cluster that replicates new globally 
> scoped edits to peer clusters. The HBase/Bigtable data model has convenient 
> features (timestamps, multiversioning) such that simple exchange of globally 
> scoped cells would be conflict free and would "just work". Implementation 
> effort here would be in producing an efficient mechanism for collecting up 
> edits from all the HRS and transmitting the edits over the network to peers 
> where they would then be split out to the HRS there. Holding on to the edit 
> trace and tracking it until the remote commits succeed would also be 
> necessary. So, HLog is probably the right place to set up the tee. This would 
> be filtered log shipping, basically.  
> This proposal does not consider transactional tables. For transactional 
> tables, enforcement of global mutation commit ordering would come into the 
> picture if the user  wants the  transaction to span the federation. This 
> should be an optional feature even with transactional tables themselves being 
> optional because of how slow it would be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to