[ 
https://issues.apache.org/jira/browse/PHOENIX-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954303#comment-16954303
 ] 

Kadir OZDEMIR commented on PHOENIX-5527:
----------------------------------------

[~gjacoby], thank you for the quick response. I am glad you liked the idea of 
doing full write in phase 3. I am a bit surprised that it did not occur to me 
before. I think your comment on doing full write at both phases in our meeting 
last week must have made me think along these lines. So, I need to give you 
credit on this. I really want to change the design to do this.

Now, regarding unverified cell timestamps, they do not really need to be the 
same as their verified cell timestamps. I will try to explain this below.

We do not allow currently two writes for the same row to have the same 
timestamp. Two concurrent writes on the same row are serialized using a Phoenix 
level row lock for reading the existing data table row and for writing a data 
table row. "Writes are concurrent" means they read the data table (to prepare 
index mutations) before any of them update the data table. So, they do not see 
each other's updates. This means that they make the same existing row 
unverified. Therefore, the current implementation generate multiple key-values, 
one for each concurrent write with different timestamps, to set the status of 
the same existing index row unverified. This means that the existing row will 
multiple unverified key-values (cells) and each has a different timestamp. In 
other words, this means overwriting the same column of the same row with the 
same value but different timestamps. Whether these timestamps equal to the 
timestamps assigned to the overwrites for which these cells are created or not 
does not really matter. The read repair does not care about the exact value of 
the timestamp of the unverified row, it just cares that it is within the time 
range of the scan. 

In addition to making existing rows unverified, the writes will have their own 
unverified rows. While writing this I just noticed that if a write is an 
overwrite, we will generate two identical unverified cells one for the existing 
row and one for the new row. Again whether their timestamps equal to the 
timestamp of the new row or one ms less does not matter. Actually, it is weird 
that they are equal to the timestamp of the new row. If so, then when the third 
phase is completed, we have two cells (verified and unverified) for the same 
row and with different values but with the same timestamp. So we give HBase two 
cells with the same key, timestamp, but different values (i.e., verified and 
unverified) and rely on the fact that HBase will return us the last one we 
gave. HBase does this as far as I have observed so far, but I am not sure if it 
is guaranteed. Anyhow, assigning different timestamps to these cells makes more 
sense to me. I hope I have convinced you on this.

 

> Unverified index rows should not be deleted due to replication lag 
> -------------------------------------------------------------------
>
>                 Key: PHOENIX-5527
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5527
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5527.master.001.patch, 
> PHOENIX-5527.master.002.patch
>
>
> The current default delete time for unverified index rows is 10 minutes. If 
> an index table row is replicated before its data table row and the 
> replication row is unverified at the time of replication, it can be deleted 
> when it is scanned on the destination cluster. To prevent these deletes due 
> to replication lag issues, we should increase the default time to 7 days. 
> This value is configurable using the configuration parameter,  
> phoenix.global.index.row.age.threshold.to.delete.ms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to