[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994307#comment-15994307
 ] 

Vincent Poon commented on PHOENIX-3824:
---------------------------------------

[~lhofhansl] it turned out that the two are related.  Short summary is, 
normally when you do an update to a data table row, in the preBatchMutate hook 
you generate the index update (so you can write it to WAL).  To get the index 
update, you grab the current state of the row (since you're in preBatchMutate, 
it's the pre-update state of the row).  That way, you can figure out the 
existing index row, and issue a Delete for it, and then Put the new index row.

Well when you're doing an index rebuild, all your data table rows are written 
already.  So when you "grab the current state of the row", it's the same as the 
mutation you're replaying.  Since nothing has 'changed', so to speak, the 
delete isn't issued.  Hence you end up with the extra index row.

PHOENIX-3806 then gets triggered because there's some logic to handle 
out-of-order updates.  The way they handle out-of-order-updates is, if you get 
a mutation that isn't the latest timestamp (i.e. backwards in time), the code 
the rolls up through each version up to present.  That way you know the present 
index state, and if it has changed, you hide your current (back in time) index 
update by issuing a Delete after your Put.  If you have many versions, this 
"roll up" ends up being done for each one, hence the arithmetic summation 
problem.

I believe the simple fix is to make sure you don't scan for newer versions when 
you "grab the current state of the row".  There's actually code that tries to 
do that but I think there's a bug.  I'm still writing proper tests, etc, but I 
think that should fix it.

I haven't figured out PHOENIX-3825, though.  I don't know if the code is built 
to handle that, and actually it's tricky to make it work with this one.

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> -------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3824
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3824
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Vincent Poon
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to