[ 
https://issues.apache.org/jira/browse/PHOENIX-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954430#comment-16954430
 ] 

Kadir OZDEMIR commented on PHOENIX-5527:
----------------------------------------

[~priyankporwal], I do not think I directly addressed [~gjacoby]'s concern. I 
will address it with an example.  He wrote "I'm worried that in a case of 
multiple writers, the ts -1 on step 1 would potentially "unverify" a previously 
verified row that had been committed one ms prior to the current write." 

Assume that there will be a verified index row with row key with timestamp t0 
in the index table. This means there is a data table row with the same 
timestamp (i.e., t0). Now assume another write happens on the data row with 
timestamp t1.  Now assume t1 = t0+1 even thought this will be impossible for 
years to come as I explain below.

Now, index row at t0 will be unverified with timestamp t0 in the proposed 
solution. As I explained in my previous comments, if you overwrite an existing 
cell with the same timestamp, HBase gives you back the last write. So this 
means that the row will be unverified, which is what we want.

If t0 write has not completed its third phase, and for some reason t1 write 
manages to complete its first phase, t0 will be verified which we do not want. 
Please note when t1 write completes its third phase, t0 will be deleted and the 
problem will be fixed. However, it is possible that t1 write may not complete 
its third write before the index row t1 is scanned. So, I agree that this is an 
issue if it happens. But it won't happen in our systems for years, please keep 
on reading.

Let me explain that why two back to back non-concurrent writes on the same data 
row will not happen with timestamps t and t+1 in near future. The current 
implementation assigns timestamp to a batch of mutations just after the locks 
are acquired for them. Let's say they get timestamp t which is current server 
timestamp. Now the next step is to prepare the index mutations. For this, the 
data table rows need to be read for this batch of mutations. After that index 
mutations will be prepared. If all these can be completed within the same ms t, 
(i.e., less than 1 ms), then the current thread will sleep for 1 ms. This is by 
itself is a very rare event. After that, the locks will be unlocked and the 
RPCs will be made to update the index tables. When the index writes are done, 
the locks for the batch of mutations will be acquired again to do data table 
updates. When the data table updates complete, the row locks will be unlocked. 
So the next non-concurrent write on the same row can only happen after this. 
For the next non-concurrent write to get timestamp t+1 means data table reads 
for a batch of mutations must happen in 1ms and RPC calls + index table writes 
+ data table writes for the batch also happens in less 1ms. This is impossible 
with the current technology we have. 

 

 

> Unverified index rows should not be deleted due to replication lag 
> -------------------------------------------------------------------
>
>                 Key: PHOENIX-5527
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5527
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5527.master.001.patch, 
> PHOENIX-5527.master.002.patch
>
>
> The current default delete time for unverified index rows is 10 minutes. If 
> an index table row is replicated before its data table row and the 
> replication row is unverified at the time of replication, it can be deleted 
> when it is scanned on the destination cluster. To prevent these deletes due 
> to replication lag issues, we should increase the default time to 7 days. 
> This value is configurable using the configuration parameter,  
> phoenix.global.index.row.age.threshold.to.delete.ms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to