[
https://issues.apache.org/jira/browse/PHOENIX-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Taylor updated PHOENIX-4051:
----------------------------------
Attachment: PHOENIX-4051_v2.patch
Parking v2 of patch with the feedback implemented. [~tdsilva] - I tried using
latest timestamp from the client and couldn't reproduce the issue, so that
might be a simpler solution. I'm somewhat hesitant to make that change because
for UPSERT SELECT to the same table we rely on holding the timestamp for the
target table so that we don't see the rows on the select side. Also, I think
there may be cases where we can't prevent out of order updates from occurring.
> Prevent out-of-order updates for mutable index updates
> ------------------------------------------------------
>
> Key: PHOENIX-4051
> URL: https://issues.apache.org/jira/browse/PHOENIX-4051
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Assignee: James Taylor
> Attachments: PHOENIX-4051_v1.patch, PHOENIX-4051_v2.patch
>
>
> Out-of-order processing of data rows during index maintenance causes mutable
> indexes to become out of sync with regard to the data table. Here's a simple
> example to illustrate the issue:
> # Assume table T(K,V) and index X(V,K).
> # Upsert T(A, 1) at t10. Index updates: Put X(1,A) at t10.
> # Upsert T(A, 3) at t30. Index updates: Delete X(1,A) at t29, Put X(3,A) at
> t30.
> # Upsert T(A,2) at t20. Index updates: Delete X(1,A) at t19, Put X(2,A) at
> t20, Delete X(2,A) at t29
> Ideally, we'd want to remove the Delete X(1,A) at t29 since this isn't
> correct in terms of timeline consistency, but we can't do that with HBase
> without support for deleting/undoing Delete markers.
> The above is not what is occurring. Instead, when T(A,2) comes in, the Put
> X(2,A) will occur at t20, but the Delete won't occur. This causes more index
> rows than data rows, essentially making it invalid.
> A quick fix is to reset the timestamp of the data table mutations to the
> current time within the preBatchMutate call, when the row is exclusively
> locked. This skirts the issue because then timestamps won't overlap.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)