[ 
https://issues.apache.org/jira/browse/PHOENIX-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kadir OZDEMIR updated PHOENIX-5535:
-----------------------------------
    Summary: Replay delete markers during server side global index rebuild   
(was: Index rebuilds via UngroupedAggregateRegionObserver should replay delete 
markers)

> Replay delete markers during server side global index rebuild 
> --------------------------------------------------------------
>
>                 Key: PHOENIX-5535
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5535
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Blocker
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5535.4.x-HBase-1.5.001.patch, 
> PHOENIX-5535.master.001.patch, PHOENIX-5535.master.002.patch, 
> PHOENIX-5535.master.003.patch
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently index rebuilds for global index tables are done on the server side. 
> Phoenix client generates an aggregate plan using ServerBuildIndexCompiler to 
> scan every data table row on the server side . This complier sets the scan 
> attributes so that the row mutations that are scanned by 
> UngroupedRegionObserver are then replayed on the data table so that index 
> table rows are rebuilt. During this replay, data table row updates are 
> skipped and only index table row are updated.
> Phoenix allows column entries to have null values. Null values are 
> represented by HBase column delete marker. This means that index rebuild must 
> replay these delete markers along with put mutations. In order to do that 
> ServerBuildIndexCompiler should use raw scans but currently it does use 
> regular scans. This leads incorrect index rebuilds when null values are used.
> A simple test where a data table with one global index with a covered column 
> that can take null value is sufficient to reproduce this problem.
>  # Create a data table with columns  a,  b, and c where a is the primary key 
> and c can have null value
>  # Write one row with not null values
>  # Overwrite the covered column with null (i.e., set it to null) 
>  # Create an index on the table where b is the secondary key and c is covered 
> column
>  # Rebuild the index
>  # Dump the index table
> The index table row should have the null value for the covered column. 
> However, it has the not null value written at step 2.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to