[ 
https://issues.apache.org/jira/browse/PHOENIX-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957987#comment-16957987
 ] 

Kadir OZDEMIR commented on PHOENIX-5535:
----------------------------------------

[~gjacoby], I am going to revert the patch  PHOENIX-5535 one more time. There 
are two reasons for it. The first one is the failing tests. The second one is 
doing raw scans and replaying all versions of a row for the sake of creating 
null values seems an overkill and makes the rebuilding row more expensive. I 
think there is a much cleaner, efficient and simpler solution for this, which 
is inserting delete markers on the regular indexing data path.  The indexing 
path should check if, after forming index updates from the data table rows, 
there are still missing covered column cells. If so, it should simply generate 
delete markers for them always (i.e., for both replays and regular updates). I 
think this should have done in the first place. This actually prevents other 
issues because delete markers eliminate the need for relying on the previous 
version of the index row for the missing covered columns. By writing full rows 
at the last write phase, we can safely ad delete markers to delete all versions 
of a column instead of just this version which is more expensive.

> Index rebuilds via UngroupedAggregateRegionObserver should replay delete 
> markers
> --------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5535
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5535
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Blocker
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5535.4.x-HBase-1.5.001.patch, 
> PHOENIX-5535.master.001.patch, PHOENIX-5535.master.002.patch, 
> PHOENIX-5535.master.003.patch
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently index rebuilds for global index tables are done on the server side. 
> Phoenix client generates an aggregate plan using ServerBuildIndexCompiler to 
> scan every data table row on the server side . This complier sets the scan 
> attributes so that the row mutations that are scanned by 
> UngroupedRegionObserver are then replayed on the data table so that index 
> table rows are rebuilt. During this replay, data table row updates are 
> skipped and only index table row are updated.
> Phoenix allows column entries to have null values. Null values are 
> represented by HBase column delete marker. This means that index rebuild must 
> replay these delete markers along with put mutations. In order to do that 
> ServerBuildIndexCompiler should use raw scans but currently it does use 
> regular scans. This leads incorrect index rebuilds when null values are used.
> A simple test where a data table with one global index with a covered column 
> that can take null value is sufficient to reproduce this problem.
>  # Create a data table with columns  a,  b, and c where a is the primary key 
> and c can have null value
>  # Write one row with not null values
>  # Overwrite the covered column with null (i.e., set it to null) 
>  # Create an index on the table where b is the secondary key and c is covered 
> column
>  # Rebuild the index
>  # Dump the index table
> The index table row should have the null value for the covered column. 
> However, it has the not null value written at step 2.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to