[jira] [Comment Edited] (HBASE-13639) SyncTable - rsync for HBase tables

Andrew Purtell (JIRA) Thu, 14 May 2015 15:33:45 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544488#comment-14544488
 ]


Andrew Purtell edited comment on HBASE-13639 at 5/14/15 10:32 PM:
------------------------------------------------------------------

+1

I tested this with two small clusters:

# Use LTT to initialize test tables on each cluster
# Use LTT to write 100000 rows starting from key 0 to cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Check row count on cluster 2, 100000 as expected
# Use LTT to write 100000 rows starting from key 100000 to cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Check row count on cluster 2, 200000 as expected
# Run LTT to update 20% of cells on cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Observed SyncTable pull updates from cluster 1 to cluster 2 and write back 
"old" cells from cluster 2 to cluster 1. Row count didn't change. Number of 
missing cells reported on source and target ~20%


was (Author: apurtell):
+1

I tested this with two small clusters:

# Use LTT to initialize test tables on each cluster
# Use LTT to write 100000 rows starting from key 0 to cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Check row count on cluster 2, 100000 as expected
# Use LTT to write 100000 rows starting from key 100000 to cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Check row count on cluster 2, 200000 as expected
# Run LTT to update 20% of cells on cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Observed SyncTable pull updates from cluster 1 to cluster 2 and write back 
"old" cells from cluster 2 to cluster 1. Row count didn't change. 

> SyncTable - rsync for HBase tables
> ----------------------------------
>
>                 Key: HBASE-13639
>                 URL: https://issues.apache.org/jira/browse/HBASE-13639
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 2.0.0, 0.98.14, 1.2.0
>
>         Attachments: HBASE-13639-0.98.patch, HBASE-13639-v1.patch, 
> HBASE-13639-v2.patch, HBASE-13639.patch
>
>
> Given HBase tables in remote clusters with similar but not identical data, 
> efficiently update a target table such that the data in question is identical 
> to a source table.  Efficiency in this context means using far less network 
> traffic than would be required to ship all the data from one cluster to the 
> other.  Takes inspiration from rsync.
> Design doc: 
> https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-13639) SyncTable - rsync for HBase tables

Reply via email to