Kadir OZDEMIR created PHOENIX-5535:
--------------------------------------

             Summary: Index rebuilds via UngroupedAggregateRegionObserver 
should replay delete markers
                 Key: PHOENIX-5535
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5535
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.14.3, 5.0.0
            Reporter: Kadir OZDEMIR
            Assignee: Kadir OZDEMIR
             Fix For: 4.15.0, 5.1.0


Currently index rebuilds for global index tables are done on the server side. 
Phoenix client generates an aggregate plan using ServerBuildIndexCompiler to 
scan every data table row on the server side . This complier sets the scan 
attributes so that the row mutations that are scanned by 
UngroupedRegionObserver are then replayed on the data table so that index table 
rows are rebuilt. During this replay, data table row updates are skipped and 
only index table row are updated.

Phoenix allows column entries to have null values. Null values are represented 
by HBase column delete marker. This means that index rebuild must replay these 
delete markers along with put mutations. In order to do that 
ServerBuildIndexCompiler should use raw scans but currently it does use regular 
scans. This leads incorrect index rebuilds when null values are used.

A simple test where a data table with one global index with a covered column 
that can take null value is sufficient to reproduce this problem.
 # Create a data table with columns  a,  b, and c where a is the primary key 
and c can have null value
 # Write one row with not null values
 # Overwrite the covered column with null (i.e., set it to null) 
 # Create an index on the table where b is the secondary key and c is covered 
column
 # Rebuild the index
 # Dump the index table

The index table row should have the null value for the covered column. However, 
it has the not null value written at step 2.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to