Zoltán Borók-Nagy created IMPALA-9512:
-----------------------------------------

             Summary: Milestone 2: Validate each row against the valid write id 
list
                 Key: IMPALA-9512
                 URL: https://issues.apache.org/jira/browse/IMPALA-9512
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: Zoltán Borók-Nagy


Minor compactions can compact several delta directories into a single delta 
directory. The current directory filtering algorithm needs to be modified to 
handle minor compacted directories and prefer those over plain delta 
directories.

On top of that, in minor compacted directories we need to filter out rows we 
cannot see. E.g. we can have the following delta directory:

full_acid/delta_0000001_0000010_0000/0000 # minWriteId: 1

                                          # maxWriteId: 10

So this delta dir contains rows with write ids between 1 and 10. But maybe we 
are only allowed to see write ids less than 5. Therefore we need to check the 
ACID write id column (named originalTransaction) for each row to decide whether 
this row is valid or not.

There are several ways to optimize this. E.g. based on the min/max write ids of 
the delta directory, and the validWriteIdList, we can decide whether we need to 
validate the rows at all. Or, when we reach the high watermark (that tells us 
the max valid write id) we can stop the scanner since rows are ordered based on 
record ID.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to