[
https://issues.apache.org/jira/browse/PHOENIX-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056477#comment-17056477
]
Kadir OZDEMIR commented on PHOENIX-5768:
----------------------------------------
Because of the correctness issue mentioned above, PHOENIX-5708 made sure that
the new index implementation returns only the cells from the last update for a
given row. Although this fixes the above correctness issue, it creates another
one when there is no orphan index rows but there are partial overwrites. In the
new design, the orphan index rows are unverified rows. There are two types of
solutions for this issue:
# Full row update during write : We can detect partial writes during preparing
index updates on the client side, read the data table, add missing column
values and prepare full index updates as we do for mutable tables. We can
either prepare the full index row on the client side or threat this partial
write as if it were on a mutable table and leverage the server side
implementation.
# Full row update during read: We can detect partial writes during preparing
index updates on the client side and do not proceed with the last index write
phase. This leaves index rows unverified. During read, the read repair will
repair these rows and turn them into full row updates.
[~giskender], [~gjacoby], I prefer option 2 as it is simpler to implement and
does not impact write performance. Any thoughts on this?
> Supporting partial overwrites for immutable tables with indexes
> ---------------------------------------------------------------
>
> Key: PHOENIX-5768
> URL: https://issues.apache.org/jira/browse/PHOENIX-5768
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Critical
>
> Phoenix allows immutable table with indexes to be overwritten partially as
> long as the indexed columns are not updated during partial overwrites.
> However, there is no check/enforcement for this. The immutable index
> mutations are prepared on the client side without reading the existing data
> table rows. This means the index mutations prepared by the client will be
> partial when the data table row mutations are partial. The new indexing
> design assumes index rows are always full and all cells within an index row
> have the same timestamp. On the read path, GlobalIndexChecker returns only
> the cells with the most recent timestamp of the row. This means that if the
> client updates the same row multiple times, the client will read back only
> the most recent update which could be partial.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)