[
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627932#comment-16627932
]
Vincent Poon commented on PHOENIX-3817:
---------------------------------------
Sorry I'm late to the party here, but this looks like a nice addition,
[~akshita.malhotra]. The overall framework looks very similar to the
IndexScrutinyTool, though it can't do the efficient parallel scans on source
and target as you do here. Wish we could've come up with a common
"CompareTableTool" framework or something to unify it all, but that can be a
refactor for another day. A few things I noticed:
1) It seems you set the targetStartRow and targetStopRow based on the source
first value and source split upperRange. Would that handle these cases:
a) a row in the target table that falls in the same range as the source
split (which the mapper is iterating over), but whose rowkey comes before the
first source rowkey
b) a key range in the target table that does not belong to any of the
source split ranges
2) There's this code snippet in getQueryPlan:
{code:java}
// Optimize the query plan so that we potentially use secondary indexes final
QueryPlan queryPlan = pstmt.optimizeQuery(selectStatement);
{code}
I think using secondary indexes for your case would be inefficient or
incorrect. Do you need to explicitly prevent this for your use case?
3) Do we need to add an early exit failsafe? I'm thinking of a case where an
operator runs the tool on two totally different tables. If the tables are
huge, just the logging output alone would be overwhelming. Maybe we can set an
upper bound on the BAD/MISSING counters.
A future enhancement might be to write the bad/missing rowkeys to a Phoenix
table, such that they're queryable.
Overall, very well done - especially on keeping the code readable!
> VerifyReplication using SQL
> ---------------------------
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Alex Araujo
> Assignee: Akshita Malhotra
> Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817-final.patch, PHOENIX-3817-final2.patch,
> PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, PHOENIX-3817.v3.patch,
> PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch, PHOENIX-3817.v6.patch,
> PHOENIX-3817.v7.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different
> table or cluster. For example, application topologies may map data for
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an
> SQL query, a target table, and an optional target cluster. The tool would
> compare data returned by the query on the different tables and update various
> result counters (similar to HBase's VerifyReplication).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)