[
https://issues.apache.org/jira/browse/PHOENIX-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551335#comment-17551335
]
Kadir OZDEMIR commented on PHOENIX-6702:
----------------------------------------
More on this. When IndexTool builds an index, it generates the entire history
of the index updates based on the data table row versions. If an index update
makes an index row invalid, the row gets deleted.
What happens here is that the scanner opened by IndexScrutiny gets one these
deleted rows in the result of set of its index table scans and this causes the
mismatch between the data table and index. The following is from the debugging
session I did with [~tkhurana] today.
{code:java}
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:/1654643902464/DeleteFamily/vlen=0/seqid=0
column= val=
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:/1654643902385/DeleteFamily/vlen=0/seqid=0
column= val=
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:B:V2/1654643902455/Put/vlen=4/seqid=0
column= B:V2 val=eF\x98[
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:_0/1654643902455/Put/vlen=1/seqid=0
column= _0 val=\x01
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:_0/1654643902381/Put/vlen=1/seqid=0
column= _0 val=\x01
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/C:/1654643902464/DeleteFamily/vlen=0/seqid=0
column= val=
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/C:/1654643902385/DeleteFamily/vlen=0/seqid=0
column= val=
\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/C:C:V3/1654643902381/Put/vlen=4/seqid=0
column= C:V3 val=C\x83)\x8B
{code}
Here an index row was added with timestamp 1654643902455 and then deleted with
timestamp 1654643902464.
This deleted index row was also seen by the read repair code
(GlobalIndexChecker) on the server side when IndexScrutiny scanned it using
Phoenix client. I added logs to print the rows GlobalIndexChecker received from
its internal scanner and passed up.
{code:java}
i:\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:B:V2/1654643902455/Put/vlen=4/seqid=142/eF\x98[
i:\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:_0/1654643902455/Put/vlen=1/seqid=142/\x01
*
o:\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/B:B:V2/1654643902455/Put/vlen=4/seqid=142/eF\x98[
{code}
So GlobalIndexChecker received two cells (labelled by " i:" ) including the
empty column cell for this deleted index row and returned one cell (labelled by
"o:") by dropping the empty column cell. The row was verified so no repair was
done.
I also added logs to print the key-value pairs received by IndexScrutiny. As
you can see below IndexScrutiny also received this deleted index row:
{code:java}
index:keyvalues={\xC1\x11\x00\x80\x00\x00\x01\x80\x00\x00\x00/_v:\x00\x00\x00\x01/1654643902455/Put/vlen=6/seqid=0/value=eF�[
}
data:keyvalues={\x80\x00\x00\x01\x80\x00\x00\x00/_v:\x00\x00\x00\x01/1654643903636/Put/vlen=6/seqid=0/value=��q�
}
{code}
IndexScrutiny formed the data table row key and then fetched the corresponding
row as shown above. However, the data table row does not match with the index
row as the index column (V1) value for this data table row is null (see the
dump below). Please note that the data table row timestamp is higher than the
index row timestamp. They are supposed to be the same always. IndexScrutiny
gets the most recent version of the data table row as can seen from the
following data table row dump:
{code:java}
************ dumping N000001;hconnection-0x6562cc23 **************
\x80\x00\x00\x00\x80\x00\x00\x00/A:_0/1654643903636/Put/vlen=1/seqid=0 column=
_0 val=x
1
\x80\x00\x00\x01\x80\x00\x00\x00/A:_0/1654643903636/Put/vlen=1/seqid=0 column=
_0 val=x
\x80\x00\x00\x01\x80\x00\x00\x00/C:V3/1654643903636/Put/vlen=4/seqid=0 column=
V3 val=\xF1\xCFq\x9A
2
\x80\x00\x00\x02\x80\x00\x00\x00/A:_0/1654643903636/Put/vlen=1/seqid=0 column=
_0 val=x
\x80\x00\x00\x02\x80\x00\x00\x00/C:V3/1654643903636/Put/vlen=4/seqid=0 column=
V3 val=AQ\xA4?
3
{code}
The scan also returns the correct version of the index row :
{code:java}
index:keyvalues={\x00\x80\x00\x00\x01\x80\x00\x00\x00/_v:\x00\x00\x00\x01/1654643903636/Put/vlen=6/seqid=0/value=��q�
}
data:keyvalues={\x80\x00\x00\x01\x80\x00\x00\x00/_v:\x00\x00\x00\x01/1654643903636/Put/vlen=6/seqid=0/value=��q�
}
{code}
This does not look like a Phoenix indexing issue. Why does it happen with the
indexing table? We do not know it but it is worthwhile to note that
IndexScrutiny scans all the index rows with one scan but scans each data table
row with a separate scan.
Is this an HBase bug or a compatibility issue between Phoenix and HBase
2.4.11+? We think that this is not an HBase bug as when we dump the index table
just before the IndexScrutiny run, we get the correct index rows. So, it is
likely a compatibility issue between Phoenix and HBase 2.4.11+.
> ConcurrentMutationsExtendedIT and PartialIndexRebuilderIT fail on Hbase
> 2.4.11+
> -------------------------------------------------------------------------------
>
> Key: PHOENIX-6702
> URL: https://issues.apache.org/jira/browse/PHOENIX-6702
> Project: Phoenix
> Issue Type: Bug
> Components: core
> Affects Versions: 5.2.0, 5.1.3
> Reporter: Istvan Toth
> Assignee: Kadir OZDEMIR
> Priority: Blocker
> Fix For: 5.2.0
>
> Attachments: bisect.sh
>
>
> On my local machine
> ConcurrentMutationsExtendedIT.testConcurrentUpserts failed 6 out 10 times
> while PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild failed 10 out
> of 10 times with HBase 2.4.11 (the default build)
> The same tests succeeded 3 out of 3 times with HBase 2.3.7.
> Either HBase 2.4 has a bug, or our compatibility modules need to be fixed.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)