[
https://issues.apache.org/jira/browse/PHOENIX-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954303#comment-16954303
]
Kadir OZDEMIR commented on PHOENIX-5527:
----------------------------------------
[~gjacoby], thank you for the quick response. I am glad you liked the idea of
doing full write in phase 3. I am a bit surprised that it did not occur to me
before. I think your comment on doing full write at both phases in our meeting
last week must have made me think along these lines. So, I need to give you
credit on this. I really want to change the design to do this.
Now, regarding unverified cell timestamps, they do not really need to be the
same as their verified cell timestamps. I will try to explain this below.
We do not allow currently two writes for the same row to have the same
timestamp. Two concurrent writes on the same row are serialized using a Phoenix
level row lock for reading the existing data table row and for writing a data
table row. "Writes are concurrent" means they read the data table (to prepare
index mutations) before any of them update the data table. So, they do not see
each other's updates. This means that they make the same existing row
unverified. Therefore, the current implementation generate multiple key-values,
one for each concurrent write with different timestamps, to set the status of
the same existing index row unverified. This means that the existing row will
multiple unverified key-values (cells) and each has a different timestamp. In
other words, this means overwriting the same column of the same row with the
same value but different timestamps. Whether these timestamps equal to the
timestamps assigned to the overwrites for which these cells are created or not
does not really matter. The read repair does not care about the exact value of
the timestamp of the unverified row, it just cares that it is within the time
range of the scan.
In addition to making existing rows unverified, the writes will have their own
unverified rows. While writing this I just noticed that if a write is an
overwrite, we will generate two identical unverified cells one for the existing
row and one for the new row. Again whether their timestamps equal to the
timestamp of the new row or one ms less does not matter. Actually, it is weird
that they are equal to the timestamp of the new row. If so, then when the third
phase is completed, we have two cells (verified and unverified) for the same
row and with different values but with the same timestamp. So we give HBase two
cells with the same key, timestamp, but different values (i.e., verified and
unverified) and rely on the fact that HBase will return us the last one we
gave. HBase does this as far as I have observed so far, but I am not sure if it
is guaranteed. Anyhow, assigning different timestamps to these cells makes more
sense to me. I hope I have convinced you on this.
> Unverified index rows should not be deleted due to replication lag
> -------------------------------------------------------------------
>
> Key: PHOENIX-5527
> URL: https://issues.apache.org/jira/browse/PHOENIX-5527
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Major
> Fix For: 4.15.0, 5.1.0
>
> Attachments: PHOENIX-5527.master.001.patch,
> PHOENIX-5527.master.002.patch
>
>
> The current default delete time for unverified index rows is 10 minutes. If
> an index table row is replicated before its data table row and the
> replication row is unverified at the time of replication, it can be deleted
> when it is scanned on the destination cluster. To prevent these deletes due
> to replication lag issues, we should increase the default time to 7 days.
> This value is configurable using the configuration parameter,
> phoenix.global.index.row.age.threshold.to.delete.ms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)