[
https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906884#action_12906884
]
Prakash Khemani commented on HBASE-2957:
----------------------------------------
Sorry, I was out and couldn't reply to this thread.
I think a general solution that guarantees consistency for PUTs and ICVs and at
the same time doesn't hold the row lock while updating hlog is possible.
===
Thinking aloud. First why do we want to hold the row lock around the log sync?
Because we want the log sync to happen in causal ordering. Here is a scenario
of what can go wrong if we release the row lock before the sync completes.
1. client-1 does a put/icv on regionserver-1. releases the row lock
before the sync.
2. client-2 comes in and reads the new value. Based on this just read
value, client-2 then does a put in regionserver-2.
3. client-2 is able to do its sync on rs-2 before client-1's sync on
rs-1 completes.
4. rs-1 is brought down ungracefully. During recovery we will have
client-2's update but not client-1's. And that violates the causal ordering of
events.
===
So we don't want anyone to read a value which has not already been synced. I
think we can transfer the wait-for-sync to the reader instead of asking all
writers to wait.
A simple way to do that will be to attach a log-sync-number with every cell.
When a cell is updated it will keep the next log-sync-number within itself. A
get will not return until the current log-sync-number is at least as big as
log-sync-number stored in the cell.
An update can return immediately after queuing the sync. The "wait-for-sync" is
transferred from the writer to the reader. If the reader comes in sufficiently
late (which is likely) then there will be no wait-for-syncs in the system.
===
Even in this scheme we will have to treat ICVs specially. Logically an ICV has
a (a) GET the old value (b) PUT the new value (c) GET and return the new value
There are 2 cases
(1) The ICV caller doesn't use the return value of the ICV. In this case the
ICV need not wait for the earlier sync to complere. (In my use case this what
happens predominantly)
(2) The ICV caller uses the return value of the ICV call to make further
updates. In this case the ICV has to wait for its sync to complete before it
returns. While the ICV is waiting for the sync to complete it need not hold the
row lock. (At least in my use case this is a very rare case)
===
I think that it is true in general that while a GET is forced to wait for a
sync to complete, there is no need to hold the row lock.
===
> Release row lock when waiting for wal-sync
> ------------------------------------------
>
> Key: HBASE-2957
> URL: https://issues.apache.org/jira/browse/HBASE-2957
> Project: HBase
> Issue Type: Improvement
> Components: regionserver, wal
> Affects Versions: 0.20.0
> Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync
> to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a)
> the row lock is held while the row is updated in memory (b) the row lock is
> released after queuing the KV record for WAL-syncing (c) the log-sync system
> guarantees that the log records for any given row are synced in order (d) the
> HBase client only receives a success notification after the sync completes
> (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,
> the handler thread spends the bulk of its row-lock critical section time
> waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync
> records, the "Don't hold row lock while waiting for sync" option should be
> available to HBase clients on a per request basis.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.