[
https://issues.apache.org/jira/browse/PHOENIX-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955907#comment-16955907
]
Kadir OZDEMIR commented on PHOENIX-5535:
----------------------------------------
Actually replaying delete markers should not be sufficient as there is no
guarantee that delete markers will be there when rebuild is run as they can
removed by HBase compactions. When PHOENIX-5018 unified the partial index
builder and index, the underlying assumption that the partial builder is
correct. The only missing step was that the partial builder replays delete
markers by the using raw scan for the data table. With this bug, I noticed that
IndexTool does not use delete markers for full builds. So, I though that would
be the fix for handling Phoenix level null column values. Actually, it worked
as in my tests compaction was not happening. This means the bug reported here
should be titled "Index rebuilds should replay null column values". The bug is
applicable to both partial rebuild and the full rebuild that replays HBase
mutations. This includes auto partial rebuild and the partial rebuild initiated
by the Index tool as well as the full rebuild that replay mutations (i.e., by
leveraging UngroupedAggregateRegionObserver).
The solution should be retrieving mutations raw scan, and then adding delete
markers for the missing columns. [~vincentpoon], [~gjacoby], [~larsh],
[~ChinmayKulkarni], [~giacomotaylor] any comments on this?
> Index rebuilds via UngroupedAggregateRegionObserver should replay delete
> markers
> --------------------------------------------------------------------------------
>
> Key: PHOENIX-5535
> URL: https://issues.apache.org/jira/browse/PHOENIX-5535
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Blocker
> Fix For: 4.15.0, 5.1.0
>
> Attachments: PHOENIX-5535.master.001.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently index rebuilds for global index tables are done on the server side.
> Phoenix client generates an aggregate plan using ServerBuildIndexCompiler to
> scan every data table row on the server side . This complier sets the scan
> attributes so that the row mutations that are scanned by
> UngroupedRegionObserver are then replayed on the data table so that index
> table rows are rebuilt. During this replay, data table row updates are
> skipped and only index table row are updated.
> Phoenix allows column entries to have null values. Null values are
> represented by HBase column delete marker. This means that index rebuild must
> replay these delete markers along with put mutations. In order to do that
> ServerBuildIndexCompiler should use raw scans but currently it does use
> regular scans. This leads incorrect index rebuilds when null values are used.
> A simple test where a data table with one global index with a covered column
> that can take null value is sufficient to reproduce this problem.
> # Create a data table with columns a, b, and c where a is the primary key
> and c can have null value
> # Write one row with not null values
> # Overwrite the covered column with null (i.e., set it to null)
> # Create an index on the table where b is the secondary key and c is covered
> column
> # Rebuild the index
> # Dump the index table
> The index table row should have the null value for the covered column.
> However, it has the not null value written at step 2.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)