[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500599#comment-16500599 ]
Akshita Malhotra commented on PHOENIX-3817: ------------------------------------------- [~alexaraujo] From the various tests I have run seems like there are certain assumptions being made with the Multi-Table RecordReader approach. For example, while setting the start row for a target region scan based on source scan start row, if the target start row is strictly greater and the size of the target scan is smaller than the source scan this approach would fail to determine the correct amount of good/bad rows (a subset scenario). Similarly, it would yield incorrect results if there are holes in the target scan which is a likely error scenario in case a map reduce job discard nondeterministically processed rows (not very likely in our migration scenario but generally with M/R). I was going through the HBase Verify Replication approach, one way to resolve these issues would be to do something similar i.e. for every source row processed, find the corresponding target scan (start row = current source row and end row = source split end row) thereby eliminating the need for a multi-table record reader. fyi, [~gjacoby] > VerifyReplication using SQL > --------------------------- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement > Reporter: Alex Araujo > Assignee: Alex Araujo > Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, > PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)