[ 
https://issues.apache.org/jira/browse/PHOENIX-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056443#comment-17056443
 ] 

Kadir OZDEMIR commented on PHOENIX-5768:
----------------------------------------

The old indexing design for mutable indexes generates index updates always as 
full rows. However, it allows partial index rows for immutable tables. The new 
design also follows this full row update design choice but it assumes that it 
will be the case for both mutable and immutable indexes. Given that we need to 
support the existing behavior, the new design should also support partial index 
updates. The partial index updates leads to a correctness issue for both the 
old and new design. This happens as follows:
 # An index row is updated but the corresponding data table row update fails. 
This leaves an orphan index row (i.e., no corresponding data table row). 
 # The same row is partially updated on both the data and index table.
 # Client reads this row back from the index table. If this read includes 
columns that are not updated in step 2, it will get the column values from the 
orphan index row (if the orphan row includes these columns). 

> Supporting partial overwrites for immutable tables with indexes
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5768
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5768
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Critical
>
> Phoenix allows immutable table with indexes to be overwritten partially as 
> long as the indexed columns are not updated during partial overwrites. 
> However, there is no check/enforcement for this. The immutable index 
> mutations are prepared on the client side without reading the existing data 
> table rows. This means the index mutations prepared by the client will be 
> partial when the data table row mutations are partial. The new indexing 
> design assumes index rows are always full and all cells within an index row 
> have the same timestamp. On the read path, GlobalIndexChecker returns only 
> the cells with the most recent timestamp of the row. This means that if the 
> client updates the same row multiple times, the client will read back only 
> the most recent update which could be partial.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to