[ 
https://issues.apache.org/jira/browse/PHOENIX-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984046#comment-16984046
 ] 

Kadir OZDEMIR edited comment on PHOENIX-5597 at 11/28/19 5:03 AM:
------------------------------------------------------------------

This bug was introduced by PHOENIX-5539 and PHOENIX-5540. The order of full 
index write was changed. Instead of writing full index rows in the first phase, 
full index row writes were written at the last phase. So, in the first phase,  
the only empty column with the unverified status was written. In the last 
phase, the full row index row was written.

However, if the last index row fails to complete (i.e., the index row is left 
in the unverified status), the index row will include only data row key 
columns,  indexed columns and the empty column. The covered columns will not be 
included in the unverified index row. Now, if a query includes a condition on a 
covered column, the scan for this query will filter out rows based on this 
condition. This means that the scan will not return the unverified rows as 
these row will not include covered columns, and thus the condition will not 
hold. This means these unverified rows will be skipped and will not be repaired.

I have verified this with an integration test. To fix this, we need to revert 
PHOENIX-5539 and PHOENIX-5540 (and also the latest patch for PHOENIX-5527). 
These fixes were for just optimization so we can safely revert them.

[~larsh], [~ckulkarni], [~gjacoby], [~vincentpoon],  I will do a manual revert 
and post the PR soon when all the tests pass locally for me, likely in a day.


was (Author: kozdemir):
This bug was introduced by PHOENIX-5539 and PHOENIX-5540. The order of full 
index write was changed. Instead of writing full index rows in the first phase, 
full index row writes were written at the last phase. So, in the first phase,  
the only empty column with the unverified status was written. In the last 
phase, the full row index row was written.

However, if the last index row fails to complete (i.e., the index row is left 
in the unverified status), the index row will include only data row key 
columns,  indexed columns and the empty column. The covered columns will not be 
included in the unverified index row. Now, if a query include a condition on a 
covered column, the scan for this query will filter out rows based on this 
condition. This means that the scan will not return the unverified rows as they 
would not include covered columns. This means these unverified rows will be 
skipped and will not be repaired.

I have verified this with an integration test. To fix this, we need to revert 
PHOENIX-5539 and PHOENIX-5540 (and also the latest patch for PHOENIX-5527). 
These fixes were for just optimization so we can safely revert them.

[~larsh], [~ckulkarni], [~gjacoby], [~vincentpoon],  I will do a manual revert 
and post the PR soon when all the tests pass locally for me, likely in a day.

> No read repair happens when scans filter rows based on a covered column
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-5597
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5597
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Blocker
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5597.4.x-HBase-1.5.001.patch, 
> PHOENIX-5597.master.001.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Assume that the schema for a data and index table is as follows:
> create table datatable (id varchar(10) not null primary key, val1 
> varchar(10), val2 varchar(10), val3 varchar(10))
> create index indextable on datatable (val1) include (val2, val3)
> A query that filters rows on a covered column does not trigger the index read 
> repair for unverified index rows. For example, the following query will not 
> trigger the read repair
> select val2, val3 from datatable WHERE val1 = 'ab' and val2 = 'abc'
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to