[jira] [Commented] (PHOENIX-3847) Handle out of order rows during index maintenance

Vincent Poon (JIRA) Fri, 12 May 2017 16:56:19 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008938#comment-16008938
 ]


Vincent Poon commented on PHOENIX-3847:
---------------------------------------

I guess then point queries wouldn't work.  Hmm yea not sure we can get around 
the extra A,1 at 3000

> Handle out of order rows during index maintenance
> -------------------------------------------------
>
>                 Key: PHOENIX-3847
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3847
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Based on the investigation and work done in PHOENIX-3825 plus the existence 
> of the ignoreNewerMutations flag, it seems that out of order rows are not 
> handled correctly during index maintenance. When the user handles replaying 
> failed batches, we force them to submit them in timestamp order. As long as 
> the user provides the original timestamp, the order shouldn't matter. 
> Regardless of the order the server processes data table mutations, the 
> resulting index rows should be the same and should purely be based on the 
> cell time stamp of the data rows. Ideally, we shouldn't need the 
> ignoreNewerMutations flag at all. Perhaps that was the intent with 
> IndexUpdateManager.fixUpCurrentUpdates(), but it doesn't to be working.
> Would it work to simply generate all the index rows for the mutating data 
> rows for all versions? We should walk through a series of examples to see if 
> this would work.  For example, with the following data table:
> |Type|RowKey|Value|Timestamp
> | Put | 1 | A | 1000
> | Put | 1 | C | 3000
> the index table would look like this:
> |Type|RowKey|Timestamp
> | Put | A,1 | 1000
> | Del | A,1 | 3000
> | Put | C,1 | 3000
> Then if a Put comes in out of order at 2000, the data table would look like 
> this:
> |Type|RowKey|Value|Timestamp
> | Put | 1 | A | 1000
> | Put | 1 | B | 2000
> | Put | 1 | C | 3000
> and the index table should look like this:
> |Type|RowKey|Timestamp
> | Put | A,1 | 1000
> | Del | A,1 | 2000
> | Put | B,1 | 2000
> | Del | B,1 | 3000
> | Put | C,1 | 3000
> Given that we can't reverse Delete markers, I'm not sure we can get there 
> completely. We'd still have a Delete of A,1 @ 3000. But perhaps this is not a 
> problem? We'd need to play this out further and include scenarios with row 
> delete as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3847) Handle out of order rows during index maintenance

Reply via email to