[
https://issues.apache.org/jira/browse/PHOENIX-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kadir OZDEMIR updated PHOENIX-5535:
-----------------------------------
Summary: Replay delete markers during server side global index rebuild
(was: Index rebuilds via UngroupedAggregateRegionObserver should replay delete
markers)
> Replay delete markers during server side global index rebuild
> --------------------------------------------------------------
>
> Key: PHOENIX-5535
> URL: https://issues.apache.org/jira/browse/PHOENIX-5535
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Blocker
> Fix For: 4.15.0, 5.1.0
>
> Attachments: PHOENIX-5535.4.x-HBase-1.5.001.patch,
> PHOENIX-5535.master.001.patch, PHOENIX-5535.master.002.patch,
> PHOENIX-5535.master.003.patch
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> Currently index rebuilds for global index tables are done on the server side.
> Phoenix client generates an aggregate plan using ServerBuildIndexCompiler to
> scan every data table row on the server side . This complier sets the scan
> attributes so that the row mutations that are scanned by
> UngroupedRegionObserver are then replayed on the data table so that index
> table rows are rebuilt. During this replay, data table row updates are
> skipped and only index table row are updated.
> Phoenix allows column entries to have null values. Null values are
> represented by HBase column delete marker. This means that index rebuild must
> replay these delete markers along with put mutations. In order to do that
> ServerBuildIndexCompiler should use raw scans but currently it does use
> regular scans. This leads incorrect index rebuilds when null values are used.
> A simple test where a data table with one global index with a covered column
> that can take null value is sufficient to reproduce this problem.
> # Create a data table with columns a, b, and c where a is the primary key
> and c can have null value
> # Write one row with not null values
> # Overwrite the covered column with null (i.e., set it to null)
> # Create an index on the table where b is the secondary key and c is covered
> column
> # Rebuild the index
> # Dump the index table
> The index table row should have the null value for the covered column.
> However, it has the not null value written at step 2.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)