[ 
https://issues.apache.org/jira/browse/PHOENIX-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944089#comment-16944089
 ] 

Geoffrey Jacoby commented on PHOENIX-5502:
------------------------------------------

So a few minutes after my last post [~kadir] pointed out offline that there's a 
likely timestamp problem here – the deletes are done using either SCN or 
LATEST_TIMESTAMP, but the subsequent index rebuild is done using the original 
timestamps of the base table, so the deletes cover the rebuilt cells.

So how to best handle deletes? Options we came up with are: 
 * Drop index and recreate
 * HBase truncate (preferably with preserve regions)
 * Write delete markers with the same ts as the index rows (not sure how well 
HBase handles this case)

For normal global indexes, option 2 seems best to me, since it's reasonably 
quick and preserves the regions, and because recreating the index is 
non-trivial absent PHOENIX-4286. 

The catch is that for view indexes, we can't truncate because each is 
co-located with all the other view indexes of the same physical base table. For 
them we'd need option 1 or 3, and in option 1 the recreated view index would 
get a new view index id. 

> ALTER INDEX REBUILD removes all rows from already valid/consistent index
> ------------------------------------------------------------------------
>
>                 Key: PHOENIX-5502
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5502
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.1, 4.14.2, 4.14.3
>            Reporter: Priyank Porwal
>            Priority: Major
>             Fix For: 4.14.1, 4.14.2, 4.14.3
>
>
> Create Table & Indexes:
> CREATE TABLE DEMO2.PEOPLE (FNAME VARCHAR NOT NULL, LNAME VARCHAR, AGE 
> TINYINT, ZIP INTEGER, CONSTRAINT pk PRIMARY KEY (FNAME, LNAME));
>  CREATE INDEX PEOPLE_BY_ZIP ON DEMO2.PEOPLE(ZIP);
>  CREATE INDEX PEOPLE_BY_AGE ON DEMO2.PEOPLE(AGE);
> Populate Data:
> UPSERT INTO DEMO2.PEOPLE VALUES ('Audi', 'Q5', 15, 65000);
> UPSERT INTO DEMO2.PEOPLE VALUES ('Volkswagon', 'Beetle', 10, 43130);
> UPSERT INTO DEMO2.PEOPLE VALUES ('BMW', 'X3', 4, 15030);
> Query Index:
> SELECT * FROM DEMO2.PEOPLE_BY_AGE;
> <3 rows show up>
> Rebuild Index:
> alter index people_by_age on DEMO2.people rebuild;
> Query Index Again:
> SELECT * FROM DEMO2.PEOPLE_BY_AGE;
> <No rows show up>
>  
> It seems that if the index is already consistent, then the rebuild command 
> removes all the index rows. Above is the simpler repro, but I have noticed 
> similar behavior where rebuild command does the right thing first time on an 
> inconsistent index (caused by truncation of table using hbase shell), but 
> second run of rebuild command removes all the rows.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to