Kadir OZDEMIR created PHOENIX-5535:
--------------------------------------
Summary: Index rebuilds via UngroupedAggregateRegionObserver
should replay delete markers
Key: PHOENIX-5535
URL: https://issues.apache.org/jira/browse/PHOENIX-5535
Project: Phoenix
Issue Type: Bug
Affects Versions: 4.14.3, 5.0.0
Reporter: Kadir OZDEMIR
Assignee: Kadir OZDEMIR
Fix For: 4.15.0, 5.1.0
Currently index rebuilds for global index tables are done on the server side.
Phoenix client generates an aggregate plan using ServerBuildIndexCompiler to
scan every data table row on the server side . This complier sets the scan
attributes so that the row mutations that are scanned by
UngroupedRegionObserver are then replayed on the data table so that index table
rows are rebuilt. During this replay, data table row updates are skipped and
only index table row are updated.
Phoenix allows column entries to have null values. Null values are represented
by HBase column delete marker. This means that index rebuild must replay these
delete markers along with put mutations. In order to do that
ServerBuildIndexCompiler should use raw scans but currently it does use regular
scans. This leads incorrect index rebuilds when null values are used.
A simple test where a data table with one global index with a covered column
that can take null value is sufficient to reproduce this problem.
# Create a data table with columns a, b, and c where a is the primary key
and c can have null value
# Write one row with not null values
# Overwrite the covered column with null (i.e., set it to null)
# Create an index on the table where b is the secondary key and c is covered
column
# Rebuild the index
# Dump the index table
The index table row should have the null value for the covered column. However,
it has the not null value written at step 2.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)