[jira] [Commented] (PHOENIX-3817) VerifyReplication using SQL

Vincent Poon (JIRA) Tue, 25 Sep 2018 14:13:12 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627932#comment-16627932
 ]


Vincent Poon commented on PHOENIX-3817:
---------------------------------------

Sorry I'm late to the party here, but this looks like a nice addition, 
[~akshita.malhotra].  The overall framework looks very similar to the 
IndexScrutinyTool, though it can't do the efficient parallel scans on source 
and target as you do here.  Wish we could've come up with a common 
"CompareTableTool" framework or something to unify it all, but that can be a 
refactor for another day.  A few things I noticed:

1)  It seems you set the targetStartRow and targetStopRow based on the source 
first value and source split upperRange.  Would that handle these cases:

   a)  a row in the target table that falls in the same range as the source 
split (which the mapper is iterating over), but whose rowkey comes before the 
first source rowkey

   b)  a key range in the target table that does not belong to any of the 
source split ranges

2) There's this code snippet in getQueryPlan:
{code:java}
// Optimize the query plan so that we potentially use secondary indexes final 
QueryPlan queryPlan = pstmt.optimizeQuery(selectStatement);
{code}
    I think using secondary indexes for your case would be inefficient or 
incorrect.  Do you need to explicitly prevent this for your use case?

3)  Do we need to add an early exit failsafe?  I'm thinking of a case where an 
operator runs the tool on two totally different tables.  If the tables are 
huge, just the logging output alone would be overwhelming.  Maybe we can set an 
upper bound on the BAD/MISSING counters.

A future enhancement might be to write the bad/missing rowkeys to a Phoenix 
table, such that they're queryable.

Overall, very well done - especially on keeping the code readable!

> VerifyReplication using SQL
> ---------------------------
>
>                 Key: PHOENIX-3817
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3817
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Alex Araujo
>            Assignee: Akshita Malhotra
>            Priority: Minor
>             Fix For: 4.15.0
>
>         Attachments: PHOENIX-3817-final.patch, PHOENIX-3817-final2.patch, 
> PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, PHOENIX-3817.v3.patch, 
> PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch, PHOENIX-3817.v6.patch, 
> PHOENIX-3817.v7.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-3817) VerifyReplication using SQL

Reply via email to