[ 
https://issues.apache.org/jira/browse/PHOENIX-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551263#comment-17551263
 ] 

Kadir OZDEMIR edited comment on PHOENIX-6702 at 6/7/22 7:54 PM:
----------------------------------------------------------------

I did further debugging. This bug happens even with a simpler version of the 
test which does not write nulls (first I thought the null handling logic has 
some issue but that was not the case). It also happens without server side 
paging (I disabled paging and verified it).  

I could also reproduce it without truncating the existing index but adding a 
new index. When the test completes upserting rows concurrently, the most recent 
versions of all the rows ends up with the same timestamp. This is because the 
batch size is 200 and the number of distinct data table rows are 51. So every 
concurrent batch writes all the rows and thus they get the same timestamp (the 
Phoenix coproc assigns the same timestamp for all mutations within a batch). 

I verified that the index is rebuild correctly and the all the most recent 
versions of the index rows have the same timestamp which is the equal to the 
timestamp of the most recent data table rows (as expected). Index Tool rebuilds 
all the rows with the verified status and thus no read repair happens during 
scans after the index tool rebuild (after truncate). I also verified with the 
debugger that repair code does not get activated. I even completely bypassed 
the index repair code (GlobalIndexChecker) and go the same failure.

When index scrutiny scans the index table after truncate and rebuild, for some 
index rows the older versions show up in the result set of the index scan. 
These older versions do not match with the data table table rows and the test 
fails. As [~richardantal] observed, if you remove the assert for matching,  you 
would see the scan returns one or more extra rows. 

It is interesting that when I dump the index table whenever there is a mismatch 
found by index scrutiny, the dump returns the correct versions of the rows.



was (Author: kozdemir):
I did further debugging. This bug happens even with a simpler version of the 
test which does not write nulls (first I thought the null handling logic has 
some issue but that was not the case). It also happens without server side 
paging (I disabled paging and verified it).  

I could also reproduce it without truncating the existing index but adding a 
new index. When the test completes upserting rows concurrently, the most recent 
versions of all the rows ends up with the same timestamp. This is because the 
batch size is 200 and the number of distinct data table rows are 51. So every 
concurrent batch writes all the rows and thus they get the same timestamp (the 
Phoenix coproc assigns the same timestamp for all mutations within a batch). 

I verified that the index is rebuild correctly and the all the most recent 
versions of the index rows have the same timestamp which is the equal to the 
timestamp of the most recent data table rows (as expected). Index Tool rebuilds 
all the rows with the verified status and thus no read repair happens during 
scans after the index tool rebuild (after truncate). I also verified with the 
debugger that repair code does not get activated. I even completely bypassed 
the index repair code (GlobalIndexChecker) and go the same failure.

When index scrutiny scans the index table after truncate and rebuild, for some 
index rows the older versions show up in the result set of the index scan. 
These older versions do not match with the data table table rows and the test 
fails. As [~richardantal] observed, if you remove the assert for matching,  you 
would see the scan returns one or more extra rows. 

It is interesting that when I dump the index table using TestUtil whenever 
there is a mismatch found by index scrutiny, the scan returns the correct 
versions of the rows.


> ConcurrentMutationsExtendedIT and PartialIndexRebuilderIT fail on Hbase 
> 2.4.11+
> -------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6702
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6702
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.2.0, 5.1.3
>            Reporter: Istvan Toth
>            Assignee: Kadir OZDEMIR
>            Priority: Blocker
>             Fix For: 5.2.0
>
>         Attachments: bisect.sh
>
>
> On my local machine
> ConcurrentMutationsExtendedIT.testConcurrentUpserts failed 6 out 10 times 
> while PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild failed 10 out 
> of 10 times with HBase 2.4.11 (the default build)
>  The same tests succeeded 3 out of 3 times with HBase 2.3.7.
> Either HBase 2.4 has a bug, or our compatibility modules need to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to