[
https://issues.apache.org/jira/browse/IMPALA-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy updated IMPALA-15041:
---------------------------------------
Description:
This is an alternative to IMPALA-15040.
To process Iceberg V3 tables we execute the following plan:
{noformat}
UNION ALL
/ \
/ \
/ \
SCAN all IcebergDeleteNode
datafiles / \
without / \
deletes SCAN SCAN
datafiles deletes
with
deletes
{noformat}
IcebergDeleteNode deals with position delete records and Deletion Vectors as
well. Since position delete files are being deprecated by Iceberg V3, in most
cases "SCAN deletes" will be empty, and IcebergDeleteNode only need to deal
with Deletion Vectors.
DV evaluation could be pushed down to "SCAN datafiles with deletes". That would
be especially beneficial in the context of late materialization, in which case
we can skip materializing column values of inactive records.
was:
This is an alternative to IMPALA-15040.
To process Iceberg V3 tables we execute the following plan:
{noformat}
UNION ALL
/ \
/ \
/ \
SCAN all IcebergDeleteNode
datafiles / \
without / \
deletes SCAN SCAN
datafiles deletes
with
deletes
{noformat}
IcebergDeleteNode deals with position delete records and Deletion Vectors as
well. Since position delete files are being deprecated by Iceberg V3, in most
cases "SCAN deletes" will be empty, and IcebergDeleteNode only need to deal
with Deletion Vectors.
DV evaluation could be pushed down to "SCAN datafiles with deletes". The would
be especially beneficial in the context of late materialization, in which case
we can skip materializing column values of inactive records.
> Push down Deletion Vector evaluation to the scanners
> ----------------------------------------------------
>
> Key: IMPALA-15041
> URL: https://issues.apache.org/jira/browse/IMPALA-15041
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
>
> This is an alternative to IMPALA-15040.
> To process Iceberg V3 tables we execute the following plan:
> {noformat}
> UNION ALL
> / \
> / \
> / \
> SCAN all IcebergDeleteNode
> datafiles / \
> without / \
> deletes SCAN SCAN
> datafiles deletes
> with
> deletes
> {noformat}
> IcebergDeleteNode deals with position delete records and Deletion Vectors as
> well. Since position delete files are being deprecated by Iceberg V3, in most
> cases "SCAN deletes" will be empty, and IcebergDeleteNode only need to deal
> with Deletion Vectors.
> DV evaluation could be pushed down to "SCAN datafiles with deletes". That
> would be especially beneficial in the context of late materialization, in
> which case we can skip materializing column values of inactive records.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]