[ 
https://issues.apache.org/jira/browse/PHOENIX-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551335#comment-17551335
 ] 

Kadir OZDEMIR commented on PHOENIX-6702:
----------------------------------------

More on this. When IndexTool builds an index, it generates the entire history 
of the index updates based on the data table row versions. If an index  update 
makes an index row  invalid, the row gets deleted. 

What happens here is that the scanner opened by IndexScrutiny gets one these 
deleted rows in the result of set of its index table scans and this causes the 
mismatch between the data table and index. The following is from the debugging 
session I did with [~tkhurana] today. 

{code:java}
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:/1654643902464/DeleteFamily/vlen=0/seqid=0
 column=  val=
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:/1654643902385/DeleteFamily/vlen=0/seqid=0
 column=  val=
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:B:V2/1654643902455/Put/vlen=4/seqid=0
 column= B:V2 val=eF\x98[
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:_0/1654643902455/Put/vlen=1/seqid=0
 column= _0 val=\x01
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:_0/1654643902381/Put/vlen=1/seqid=0
 column= _0 val=\x01
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/C:/1654643902464/DeleteFamily/vlen=0/seqid=0
 column=  val=
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/C:/1654643902385/DeleteFamily/vlen=0/seqid=0
 column=  val=
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/C:C:V3/1654643902381/Put/vlen=4/seqid=0
 column= C:V3 val=C\x83)\x8B
{code}

Here an index row was added with timestamp 1654643902455 and then deleted with 
timestamp 1654643902464.

This deleted index row was also seen by the read repair code 
(GlobalIndexChecker) on the server side when IndexScrutiny scanned it using 
Phoenix client. I added logs to print the rows GlobalIndexChecker received from 
its internal scanner and passed up. 


{code:java}
i:\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:B:V2/1654643902455/Put/vlen=4/seqid=142/eF\x98[
i:\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:_0/1654643902455/Put/vlen=1/seqid=142/\x01
*
o:\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:B:V2/1654643902455/Put/vlen=4/seqid=142/eF\x98[
{code}

So GlobalIndexChecker received two cells (labelled by " i:" ) including the 
empty column cell for this deleted index row and returned one cell (labelled by 
"o:") by dropping the empty column cell. The row was verified so no repair was 
done.

I also added logs to print the key-value pairs received by IndexScrutiny. As 
you can see below IndexScrutiny also received this deleted index row:

{code:java}
index:keyvalues={\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/_v:\x00\x00\x00\x01/1654643902455/Put/vlen=6/seqid=0/value=eF�[
 }

data:keyvalues={\x80\x00\x00\x01\x80\x00\x00\x00/_v:\x00\x00\x00\x01/1654643903636/Put/vlen=6/seqid=0/value=��q�
 }
{code}

IndexScrutiny formed the data table row key and then fetched the corresponding 
row as shown above.  However, the data table row does not match with the index 
row as the index column (V1) value for this data table row is null  (see the 
dump below).  Please note that the data table row timestamp is higher than the 
index row timestamp. They are supposed to be the same always.  IndexScrutiny 
gets the most recent version of the data table row  as can seen from the 
following data table row dump:

{code:java}
************ dumping N000001;hconnection-0x6562cc23 **************
\x80\x00\x00\x00\x80\x00\x00\x00/A:_0/1654643903636/Put/vlen=1/seqid=0 column= 
_0 val=x
1
\x80\x00\x00\x01\x80\x00\x00\x00/A:_0/1654643903636/Put/vlen=1/seqid=0 column= 
_0 val=x
\x80\x00\x00\x01\x80\x00\x00\x00/C:V3/1654643903636/Put/vlen=4/seqid=0 column= 
V3 val=\xF1\xCFq\x9A
2
\x80\x00\x00\x02\x80\x00\x00\x00/A:_0/1654643903636/Put/vlen=1/seqid=0 column= 
_0 val=x
\x80\x00\x00\x02\x80\x00\x00\x00/C:V3/1654643903636/Put/vlen=4/seqid=0 column= 
V3 val=AQ\xA4?
3

{code}

The scan also returns the correct version of the index row :

{code:java}
index:keyvalues={\x00\x80\x00\x00\x01\x80\x00\x00\x00/_v:\x00\x00\x00\x01/1654643903636/Put/vlen=6/seqid=0/value=��q�
 }

data:keyvalues={\x80\x00\x00\x01\x80\x00\x00\x00/_v:\x00\x00\x00\x01/1654643903636/Put/vlen=6/seqid=0/value=��q�
 }

{code}

This does not look like a Phoenix indexing issue. Why does it happen with the 
indexing table? We do not know it but it is worthwhile to note that 
IndexScrutiny scans all the index rows with one scan  but scans each data table 
row with a separate scan.

Is this an HBase bug or a compatibility issue between Phoenix and HBase 
2.4.11+? We think that this is not an HBase bug as when we dump the index table 
just before the IndexScrutiny run, we get the correct index rows. So, it is 
likely a compatibility issue between Phoenix and HBase 2.4.11+.
 


> ConcurrentMutationsExtendedIT and PartialIndexRebuilderIT fail on Hbase 
> 2.4.11+
> -------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6702
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6702
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.2.0, 5.1.3
>            Reporter: Istvan Toth
>            Assignee: Kadir OZDEMIR
>            Priority: Blocker
>             Fix For: 5.2.0
>
>         Attachments: bisect.sh
>
>
> On my local machine
> ConcurrentMutationsExtendedIT.testConcurrentUpserts failed 6 out 10 times 
> while PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild failed 10 out 
> of 10 times with HBase 2.4.11 (the default build)
>  The same tests succeeded 3 out of 3 times with HBase 2.3.7.
> Either HBase 2.4 has a bug, or our compatibility modules need to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to