[
https://issues.apache.org/jira/browse/PHOENIX-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954430#comment-16954430
]
Kadir OZDEMIR commented on PHOENIX-5527:
----------------------------------------
[~priyankporwal], I do not think I directly addressed [~gjacoby]'s concern. I
will address it with an example. He wrote "I'm worried that in a case of
multiple writers, the ts -1 on step 1 would potentially "unverify" a previously
verified row that had been committed one ms prior to the current write."
Assume that there will be a verified index row with row key with timestamp t0
in the index table. This means there is a data table row with the same
timestamp (i.e., t0). Now assume another write happens on the data row with
timestamp t1. Now assume t1 = t0+1 even thought this will be impossible for
years to come as I explain below.
Now, index row at t0 will be unverified with timestamp t0 in the proposed
solution. As I explained in my previous comments, if you overwrite an existing
cell with the same timestamp, HBase gives you back the last write. So this
means that the row will be unverified, which is what we want.
If t0 write has not completed its third phase, and for some reason t1 write
manages to complete its first phase, t0 will be verified which we do not want.
Please note when t1 write completes its third phase, t0 will be deleted and the
problem will be fixed. However, it is possible that t1 write may not complete
its third write before the index row t1 is scanned. So, I agree that this is an
issue if it happens. But it won't happen in our systems for years, please keep
on reading.
Let me explain that why two back to back non-concurrent writes on the same data
row will not happen with timestamps t and t+1 in near future. The current
implementation assigns timestamp to a batch of mutations just after the locks
are acquired for them. Let's say they get timestamp t which is current server
timestamp. Now the next step is to prepare the index mutations. For this, the
data table rows need to be read for this batch of mutations. After that index
mutations will be prepared. If all these can be completed within the same ms t,
(i.e., less than 1 ms), then the current thread will sleep for 1 ms. This is by
itself is a very rare event. After that, the locks will be unlocked and the
RPCs will be made to update the index tables. When the index writes are done,
the locks for the batch of mutations will be acquired again to do data table
updates. When the data table updates complete, the row locks will be unlocked.
So the next non-concurrent write on the same row can only happen after this.
For the next non-concurrent write to get timestamp t+1 means data table reads
for a batch of mutations must happen in 1ms and RPC calls + index table writes
+ data table writes for the batch also happens in less 1ms. This is impossible
with the current technology we have.
> Unverified index rows should not be deleted due to replication lag
> -------------------------------------------------------------------
>
> Key: PHOENIX-5527
> URL: https://issues.apache.org/jira/browse/PHOENIX-5527
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Major
> Fix For: 4.15.0, 5.1.0
>
> Attachments: PHOENIX-5527.master.001.patch,
> PHOENIX-5527.master.002.patch
>
>
> The current default delete time for unverified index rows is 10 minutes. If
> an index table row is replicated before its data table row and the
> replication row is unverified at the time of replication, it can be deleted
> when it is scanned on the destination cluster. To prevent these deletes due
> to replication lag issues, we should increase the default time to 7 days.
> This value is configurable using the configuration parameter,
> phoenix.global.index.row.age.threshold.to.delete.ms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)