[
https://issues.apache.org/jira/browse/PHOENIX-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057227#comment-17057227
]
Kadir OZDEMIR commented on PHOENIX-5768:
----------------------------------------
[~gjacoby] and [~giskender], Unfortunately, we cannot determination if a row
update is full or partial without looking at the current state of the data row
when the update does not cover all columns. A row update can be just a put
mutation, just a delete mutation (a set of delete column cells) or a pair of
put and delete. Even if we consider all these combinations, we cannot always
determine whether an update is a full row update. This is because the
definition of "full row update" here does not mean that we have values for all
columns of an index row. The first update on a row must be always considered
full regardless of how many columns it covers. We cannot determine if an
update is the first or a subsequent update without checking if the row already
exists. The subsequent updates should be considered partial only if they do not
cover previously covered columns. Again, we cannot determine this without
retrieving the existing row state. This means option 2 may lead to lots of
unverified rows unnecessarily as we have to label every update as partial if it
does not cover all columns.
> Supporting partial overwrites for immutable tables with indexes
> ---------------------------------------------------------------
>
> Key: PHOENIX-5768
> URL: https://issues.apache.org/jira/browse/PHOENIX-5768
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Critical
>
> Phoenix allows immutable table with indexes to be overwritten partially as
> long as the indexed columns are not updated during partial overwrites.
> However, there is no check/enforcement for this. The immutable index
> mutations are prepared on the client side without reading the existing data
> table rows. This means the index mutations prepared by the client will be
> partial when the data table row mutations are partial. The new indexing
> design assumes index rows are always full and all cells within an index row
> have the same timestamp. On the read path, GlobalIndexChecker returns only
> the cells with the most recent timestamp of the row. This means that if the
> client updates the same row multiple times, the client will read back only
> the most recent update which could be partial.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)