[
https://issues.apache.org/jira/browse/PHOENIX-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551263#comment-17551263
]
Kadir OZDEMIR edited comment on PHOENIX-6702 at 6/7/22 7:54 PM:
----------------------------------------------------------------
I did further debugging. This bug happens even with a simpler version of the
test which does not write nulls (first I thought the null handling logic has
some issue but that was not the case). It also happens without server side
paging (I disabled paging and verified it).
I could also reproduce it without truncating the existing index but adding a
new index. When the test completes upserting rows concurrently, the most recent
versions of all the rows ends up with the same timestamp. This is because the
batch size is 200 and the number of distinct data table rows are 51. So every
concurrent batch writes all the rows and thus they get the same timestamp (the
Phoenix coproc assigns the same timestamp for all mutations within a batch).
I verified that the index is rebuild correctly and the all the most recent
versions of the index rows have the same timestamp which is the equal to the
timestamp of the most recent data table rows (as expected). Index Tool rebuilds
all the rows with the verified status and thus no read repair happens during
scans after the index tool rebuild (after truncate). I also verified with the
debugger that repair code does not get activated. I even completely bypassed
the index repair code (GlobalIndexChecker) and go the same failure.
When index scrutiny scans the index table after truncate and rebuild, for some
index rows the older versions show up in the result set of the index scan.
These older versions do not match with the data table table rows and the test
fails. As [~richardantal] observed, if you remove the assert for matching, you
would see the scan returns one or more extra rows.
It is interesting that when I dump the index table whenever there is a mismatch
found by index scrutiny, the dump returns the correct versions of the rows.
was (Author: kozdemir):
I did further debugging. This bug happens even with a simpler version of the
test which does not write nulls (first I thought the null handling logic has
some issue but that was not the case). It also happens without server side
paging (I disabled paging and verified it).
I could also reproduce it without truncating the existing index but adding a
new index. When the test completes upserting rows concurrently, the most recent
versions of all the rows ends up with the same timestamp. This is because the
batch size is 200 and the number of distinct data table rows are 51. So every
concurrent batch writes all the rows and thus they get the same timestamp (the
Phoenix coproc assigns the same timestamp for all mutations within a batch).
I verified that the index is rebuild correctly and the all the most recent
versions of the index rows have the same timestamp which is the equal to the
timestamp of the most recent data table rows (as expected). Index Tool rebuilds
all the rows with the verified status and thus no read repair happens during
scans after the index tool rebuild (after truncate). I also verified with the
debugger that repair code does not get activated. I even completely bypassed
the index repair code (GlobalIndexChecker) and go the same failure.
When index scrutiny scans the index table after truncate and rebuild, for some
index rows the older versions show up in the result set of the index scan.
These older versions do not match with the data table table rows and the test
fails. As [~richardantal] observed, if you remove the assert for matching, you
would see the scan returns one or more extra rows.
It is interesting that when I dump the index table using TestUtil whenever
there is a mismatch found by index scrutiny, the scan returns the correct
versions of the rows.
> ConcurrentMutationsExtendedIT and PartialIndexRebuilderIT fail on Hbase
> 2.4.11+
> -------------------------------------------------------------------------------
>
> Key: PHOENIX-6702
> URL: https://issues.apache.org/jira/browse/PHOENIX-6702
> Project: Phoenix
> Issue Type: Bug
> Components: core
> Affects Versions: 5.2.0, 5.1.3
> Reporter: Istvan Toth
> Assignee: Kadir OZDEMIR
> Priority: Blocker
> Fix For: 5.2.0
>
> Attachments: bisect.sh
>
>
> On my local machine
> ConcurrentMutationsExtendedIT.testConcurrentUpserts failed 6 out 10 times
> while PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild failed 10 out
> of 10 times with HBase 2.4.11 (the default build)
> The same tests succeeded 3 out of 3 times with HBase 2.3.7.
> Either HBase 2.4 has a bug, or our compatibility modules need to be fixed.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)