[
https://issues.apache.org/jira/browse/PHOENIX-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351417#comment-17351417
]
ASF GitHub Bot commented on PHOENIX-6476:
-----------------------------------------
tkhurana opened a new pull request #1240:
URL: https://github.com/apache/phoenix/pull/1240
When verifying from index table to data table, there were 2 issues:
1. Data table region boundary keys were not being respected so the splits
were happening only on the basis of per task max size.
2. The actual index mutation map was was not being split for every task but
the data row keys were being split. This caused the tool to report extra index
rows which were actually false positives.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Index tool when verifying from index to data doesn't correctly split page
> into tasks
> ------------------------------------------------------------------------------------
>
> Key: PHOENIX-6476
> URL: https://issues.apache.org/jira/browse/PHOENIX-6476
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.14.3, 4.16.0, 4.16.1
> Reporter: Tanuj Khurana
> Assignee: Tanuj Khurana
> Priority: Major
>
> When running index tool with index table as source, it splits a page into
> tasks when the page size is greater than the configured task size (default
> 2048) and runs each task in parallel. Each task is assigned a set of data row
> keys but the index mutation map is not split according to the data row keys
> assigned to a particular task. As a result, the tool reports wrong results
> because the index mutation map is per page but the set of data row keys is
> per task.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)