[ 
https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573319#comment-14573319
 ] 

Dave Latham commented on HBASE-13639:
-------------------------------------

We've used this tool to repair some very large tables across a WAN link.  It 
can be challenging to run against a table getting live writes, if those writes 
are updates/overwrites.  In general, you can run it against a time range to 
ignore new writes, but if those writes update existing cells, then the time 
range scan may or may not see older versions of those cells depending on 
whether major compaction has happened, which may be different in remote 
clusters.

> SyncTable - rsync for HBase tables
> ----------------------------------
>
>                 Key: HBASE-13639
>                 URL: https://issues.apache.org/jira/browse/HBASE-13639
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 2.0.0, 0.98.14, 1.2.0
>
>         Attachments: HBASE-13639-0.98.patch, HBASE-13639-v1.patch, 
> HBASE-13639-v2.patch, HBASE-13639-v3.patch, HBASE-13639.patch
>
>
> Given HBase tables in remote clusters with similar but not identical data, 
> efficiently update a target table such that the data in question is identical 
> to a source table.  Efficiency in this context means using far less network 
> traffic than would be required to ship all the data from one cluster to the 
> other.  Takes inspiration from rsync.
> Design doc: 
> https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to