[ 
https://issues.apache.org/jira/browse/PHOENIX-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104172#comment-16104172
 ] 

Vincent Poon commented on PHOENIX-4051:
---------------------------------------

Hey [~jamestaylor] , I see this in HRegion:

{code}
 // we should record the timestamp only after we have acquired the rowLock,
      // otherwise, newer puts/deletes are not guaranteed to have a newer 
timestamp
      now = EnvironmentEdgeManager.currentTimeMillis();
      byte[] byteNow = Bytes.toBytes(now);
{code}

So why would this patch be necessary?  I do see that in HRegion they set the 
timestamp before this code:
{code}
lock(this.updatesLock.readLock(), numReadyToWrite);
      locked = true;
{code}

So that would mean this is possible:
thread 1 generates ts1
thread 2 generates ts2 , and acquires the updatesLock
thread 1 then acquires the updatesLock afterwards and passes the Indexer a ts1 
which is out of order

In which case your patch gets around that.  I'm just trying to understand what 
updatesLock is, and why it's relevant when we have the rowlocks already 
(HRegion#doMiniBatchMutate Step 1)

LGTM - also would be nice if you could include your test as part of this patch?

> Prevent out-of-order updates for mutable index updates
> ------------------------------------------------------
>
>                 Key: PHOENIX-4051
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4051
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: James Taylor
>         Attachments: PHOENIX-4051_v1.patch
>
>
> Out-of-order processing of data rows during index maintenance causes mutable 
> indexes to become out of sync with regard to the data table. Here's a simple 
> example to illustrate the issue:
> # Assume table T(K,V) and index X(V,K).
> # Upsert T(A, 1) at t10. Index updates: Put X(1,A) at t10.
> # Upsert T(A, 3) at t30. Index updates: Delete X(1,A) at t29, Put X(3,A) at 
> t30.
> # Upsert T(A,2) at t20. Index updates: Delete X(1,A) at t19, Put X(2,A) at 
> t20, Delete X(2,A) at t29
> Ideally, we'd want to remove the Delete X(1,A) at t29 since this isn't 
> correct in terms of timeline consistency, but we can't do that with HBase 
> without support for deleting/undoing Delete markers. 
> The above is not what is occurring. Instead, when T(A,2) comes in, the Put 
> X(2,A) will occur at t20, but the Delete won't occur. This causes more index 
> rows than data rows, essentially making it invalid.
> A quick fix is to reset the timestamp of the data table mutations to the 
> current time within the preBatchMutate call, when the row is exclusively 
> locked. This skirts the issue because then timestamps won't overlap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to