James Taylor created PHOENIX-3847: ------------------------------------- Summary: Handle out of order rows during index maintenance Key: PHOENIX-3847 URL: https://issues.apache.org/jira/browse/PHOENIX-3847 Project: Phoenix Issue Type: Bug Reporter: James Taylor
Based on the investigation and work done in PHOENIX-3825 plus the existence of the ignoreNewerMutations flag, it seems that out of order rows are not handled correctly during index maintenance. Regardless of the order the server processes data table mutations, the resulting index rows should be the same and should purely be based on the cell time stamp of the data rows. Ideally, we shouldn't need the ignoreNewerMutations flag at all. Perhaps that was the intent with IndexUpdateManager.fixUpCurrentUpdates(), but it doesn't to be working. Would it work to simply generate all the index rows for the mutating data rows for all versions? We should walk through a series of examples to see if this would work. For example, with the following data table: |Type|RowKey|Value|Timestamp | Put | 1 | A | 1000 | Put | 1 | C | 3000 the index table would look like this: |Type|RowKey|Timestamp | Put | A,1 | 1000 | Del | A,1 | 3000 | Put | C,1 | 3000 Then if a Put comes in out of order at 2000, the data table would look like this: |Type|RowKey|Value|Timestamp | Put | 1 | A | 1000 | Put | 1 | B | 2000 | Put | 1 | C | 3000 and the index table should look like this: |Type|RowKey|Timestamp | Put | A,1 | 1000 | Del | A,1 | 2000 | Put | B,1 | 2000 | Del | B,1 | 3000 | Put | C,1 | 3000 Given that we can't reverse Delete markers, I'm not sure we can get there completely. We'd still have a Delete of A,1 @ 3000. But perhaps this is not a problem? We'd need to play this out further and include scenarios with row delete as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)