[
https://issues.apache.org/jira/browse/IMPALA-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy reassigned IMPALA-9512:
-----------------------------------------
Assignee: Zoltán Borók-Nagy
> Milestone 2: Validate each row against the valid write id list
> --------------------------------------------------------------
>
> Key: IMPALA-9512
> URL: https://issues.apache.org/jira/browse/IMPALA-9512
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-acid
>
> Minor compactions can compact several delta directories into a single delta
> directory. The current directory filtering algorithm needs to be modified to
> handle minor compacted directories and prefer those over plain delta
> directories.
> On top of that, in minor compacted directories we need to filter out rows we
> cannot see. E.g. we can have the following delta directory:
> {noformat}
> full_acid/delta_0000001_0000010_0000/0000 # minWriteId: 1
> # maxWriteId: 10
> {noformat}
> So this delta dir contains rows with write ids between 1 and 10. But maybe we
> are only allowed to see write ids less than 5. Therefore we need to check the
> ACID write id column (named originalTransaction) for each row to decide
> whether this row is valid or not.
> There are several ways to optimize this. E.g. based on the min/max write ids
> of the delta directory, and the validWriteIdList, we can decide whether we
> need to validate the rows at all. Or, when we reach the high watermark (that
> tells us the max valid write id) we can stop the scanner since rows are
> ordered based on record ID.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]