[
https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573319#comment-14573319
]
Dave Latham commented on HBASE-13639:
-------------------------------------
We've used this tool to repair some very large tables across a WAN link. It
can be challenging to run against a table getting live writes, if those writes
are updates/overwrites. In general, you can run it against a time range to
ignore new writes, but if those writes update existing cells, then the time
range scan may or may not see older versions of those cells depending on
whether major compaction has happened, which may be different in remote
clusters.
> SyncTable - rsync for HBase tables
> ----------------------------------
>
> Key: HBASE-13639
> URL: https://issues.apache.org/jira/browse/HBASE-13639
> Project: HBase
> Issue Type: New Feature
> Reporter: Dave Latham
> Assignee: Dave Latham
> Fix For: 2.0.0, 0.98.14, 1.2.0
>
> Attachments: HBASE-13639-0.98.patch, HBASE-13639-v1.patch,
> HBASE-13639-v2.patch, HBASE-13639-v3.patch, HBASE-13639.patch
>
>
> Given HBase tables in remote clusters with similar but not identical data,
> efficiently update a target table such that the data in question is identical
> to a source table. Efficiency in this context means using far less network
> traffic than would be required to ship all the data from one cluster to the
> other. Takes inspiration from rsync.
> Design doc:
> https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)