[ https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681357#comment-16681357 ]
Hudson commented on HBASE-15557: -------------------------------- Results for branch master [build #594 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/594/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/594//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/594//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/594//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add guidance on HashTable/SyncTable to the RefGuide > --------------------------------------------------- > > Key: HBASE-15557 > URL: https://issues.apache.org/jira/browse/HBASE-15557 > Project: HBase > Issue Type: Bug > Components: documentation > Affects Versions: 1.2.0 > Reporter: Sean Busbey > Assignee: Wellington Chevreuil > Priority: Critical > Fix For: 3.0.0 > > Attachments: HBASE-15557.master.001.patch, > HBASE-15557.master.002.patch > > > The docs for SyncTable are insufficient. Brief description from [~davelatham] > HBASE-13639 comment: > {quote} > Sorry for the lack of better documentation, Abhishek Soni. Thanks for > bringing it up. I'll try to provide a better explanation. You may have > already seen it, but if not, the design doc linked in the description above > may also give you some better clues as to how it should be used. > Briefly, the feature is intended to start with a pair of tables in remote > clusters that are already substantially similar and make them identical by > comparing hashes of the data and copying only the diffs instead of having to > copy the entire table. So it is targeted at a very specific use case (with > some work it could generalize to cover things like CopyTable and > VerifyRepliaction but it's not there yet). To use it, you choose one table to > be the "source", and the other table is the "target". After the process is > complete the target table should end up being identical to the source table. > In the source table's cluster, run > org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the > source table and an output directory in HDFS. HashTable will scan the source > table, break the data up into row key ranges (default of 8kB per range) and > produce a hash of the data for each range. > Make the hashes available to the target cluster - I'd recommend using DistCp > to copy it across. > In the target table's cluster, run > org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where > you put the hashes, and the names of the source and destination tables. You > will likely also need to specify the source table's ZK quorum via the > --sourcezkcluster option. SyncTable will then read the hash information, and > compute the hashes of the same row ranges for the target table. For any row > range where the hash fails to match, it will open a remote scanner to the > source table, read the data for that range, and do Puts and Deletes to the > target table to update it to match the source. > I hope that clarifies it a bit. Let me know if you need a hand. If anyone > wants to work on getting some documentation into the book, I can try to write > some more but would love a hand on turning it into an actual book patch. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)