[ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500599#comment-16500599
 ] 

Akshita Malhotra commented on PHOENIX-3817:
-------------------------------------------

[~alexaraujo] From the various tests I have run seems like there are certain 
assumptions being made with the Multi-Table RecordReader approach. For example, 
while setting the start row for a target region scan based on source scan start 
row, if the target start row is strictly greater and the size of the target 
scan is smaller than the source scan this approach would fail to determine the 
correct amount of good/bad rows (a subset scenario). Similarly, it would yield 
incorrect results if there are holes in the target scan which is a likely error 
scenario in case a map reduce job discard nondeterministically processed rows 
(not very likely in our migration scenario but generally with M/R).

I was going through the HBase Verify Replication approach, one way to resolve 
these issues would be to do something similar i.e. for every source row 
processed, find the corresponding target scan (start row = current source row 
and end row = source split end row) thereby eliminating the need for a 
multi-table record reader. 

fyi, [~gjacoby]

> VerifyReplication using SQL
> ---------------------------
>
>                 Key: PHOENIX-3817
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3817
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Alex Araujo
>            Assignee: Alex Araujo
>            Priority: Minor
>             Fix For: 4.15.0
>
>         Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, 
> PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to