[
https://issues.apache.org/jira/browse/PHOENIX-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059084#comment-17059084
]
Kadir OZDEMIR commented on PHOENIX-5768:
----------------------------------------
[~gjacoby], your understanding is correct. Let me try to be more clear in the
scenarios where the cells from unverified rows can be returned. This can happen
when a partial but verified index row is scanned and HBase also returns the
cells (for the columns that are not covered by the partial update) from the
previously written unverified index rows along with the cells from the partial
but verified index row. Now, there are two cases where these unverified rows
happen.
The first one is due to the last index write phase failure. In this case, both
the first index phase write and the data write phase are successful for this
unverified index row. We do not consider this as a failure and we do not return
an exception to the application when this happens. After this, the application
updates the row partially and all three write phases succeed. Thus, it is safe
to return cells from such unverified rows.
The second case is due to data table row write failures. In this case, the
unverified index row will be written, then the data row write failure will lead
an exception, and finally this exception will be returned to the application.
We expect that the application will retry this write and the write will
eventually be successful. If the unverified index row is scanned before the
data table row write is successful, nothing will be returned to the application
for this unverified index row. The issue happens if the application updates the
row partially before successfully completing the previous write. What I argue
is that even in this case, it is acceptable to return cells from the unverified
rows for immutable index tables.
You mentioned the case where multiple clients update the same row concurrently
on a supposed to be immutable table. Handing that is too much to except from
this very simplistic indexing implementation for immutable tables. I would say
application should never do that for immutable tables. I think that is crossing
the line and the application should use mutable tables in that case.
My suggestion would be to use immutable indexes for really immutable tables and
for these tables, overwrites should be due to failures and should be
idempotent. For everything else, mutable tables should be used.
> Supporting partial overwrites for immutable tables with indexes
> ---------------------------------------------------------------
>
> Key: PHOENIX-5768
> URL: https://issues.apache.org/jira/browse/PHOENIX-5768
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Critical
> Attachments: PHOENIX-5678.master.001.patch
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Phoenix allows immutable table with indexes to be overwritten partially as
> long as the indexed columns are not updated during partial overwrites.
> However, there is no check/enforcement for this. The immutable index
> mutations are prepared on the client side without reading the existing data
> table rows. This means the index mutations prepared by the client will be
> partial when the data table row mutations are partial. The new indexing
> design assumes index rows are always full and all cells within an index row
> have the same timestamp. On the read path, GlobalIndexChecker returns only
> the cells with the most recent timestamp of the row. This means that if the
> client updates the same row multiple times, the client will read back only
> the most recent update which could be partial.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)