[
https://issues.apache.org/jira/browse/HBASE-11715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098940#comment-14098940
]
Jean-Marc Spaggiari commented on HBASE-11715:
---------------------------------------------
{quote}
1. How is this table copied. Do we flush and just move the HFiles over.
{quote}
Copy table is not in the scope for this. This is just a tool to do the
comparison or tables content.
{quote}
2. What do we do if they are not equivalent. Is it enough to throw an error, or
do we need to say what part of the table isn't equivalent.
{quote}
We report the information back to the user. Like, for range A to C, content is
different between the 2 tables.
{quote}
3. Do Merkle trees make sense for this type of thing?
{quote}
Not sure. We don't have any tree structure here.
{quote}
I am interested in working on this task. Merkle tree, we need to constantly to
run some background service, and it will require additional amount of data.
{quote}
I don't think Merkle tree is the right option here. But you can still evaluate
it.
{quote}
Can you provide more details, I can assign it to myself and work on this?
{quote}
Sure! Let's go for it.
> HBase should provide a tool to compare 2 remote tables.
> -------------------------------------------------------
>
> Key: HBASE-11715
> URL: https://issues.apache.org/jira/browse/HBASE-11715
> Project: HBase
> Issue Type: New Feature
> Components: util
> Reporter: Jean-Marc Spaggiari
>
> As discussed in the mailing list, when a table is copied to another cluster
> and need to be validated against the first one, only VerifyReplication can be
> used. However, this can be very long since data need to be copied again.
> We should provide an easier and faster way to compare the tables.
> One option is to calculate hashs per ranges. User can define number of
> buckets, then we split the table into this number of buckets and calculate an
> hash for each (Like partitioner is already doing). We can also optionally
> calculate an overall CRC to reduce even more hash collision.
--
This message was sent by Atlassian JIRA
(v6.2#6252)