[ 
https://issues.apache.org/jira/browse/IMPALA-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-15041:
---------------------------------------
    Description: 
This is an alternative to IMPALA-15040.

To process Iceberg V3 tables we execute the following plan:
{noformat}
       UNION ALL
      /        \
     /          \
    /            \
   SCAN all  IcebergDeleteNode
   datafiles  /      \
   without   /        \
   deletes  SCAN      SCAN
            datafiles deletes
            with
            deletes
{noformat}

IcebergDeleteNode deals with position delete records and Deletion Vectors as 
well. Since position delete files are being deprecated by Iceberg V3, in most 
cases "SCAN deletes" will be empty, and IcebergDeleteNode only need to deal 
with Deletion Vectors.

DV evaluation could be pushed down to "SCAN datafiles with deletes". That would 
be especially beneficial in the context of late materialization, in which case 
we can skip materializing column values of inactive records.

  was:
This is an alternative to IMPALA-15040.

To process Iceberg V3 tables we execute the following plan:
{noformat}
       UNION ALL
      /        \
     /          \
    /            \
   SCAN all  IcebergDeleteNode
   datafiles  /      \
   without   /        \
   deletes  SCAN      SCAN
            datafiles deletes
            with
            deletes
{noformat}

IcebergDeleteNode deals with position delete records and Deletion Vectors as 
well. Since position delete files are being deprecated by Iceberg V3, in most 
cases "SCAN deletes" will be empty, and IcebergDeleteNode only need to deal 
with Deletion Vectors.

DV evaluation could be pushed down to "SCAN datafiles with deletes". The would 
be especially beneficial in the context of late materialization, in which case 
we can skip materializing column values of inactive records.


> Push down Deletion Vector evaluation to the scanners
> ----------------------------------------------------
>
>                 Key: IMPALA-15041
>                 URL: https://issues.apache.org/jira/browse/IMPALA-15041
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> This is an alternative to IMPALA-15040.
> To process Iceberg V3 tables we execute the following plan:
> {noformat}
>        UNION ALL
>       /        \
>      /          \
>     /            \
>    SCAN all  IcebergDeleteNode
>    datafiles  /      \
>    without   /        \
>    deletes  SCAN      SCAN
>             datafiles deletes
>             with
>             deletes
> {noformat}
> IcebergDeleteNode deals with position delete records and Deletion Vectors as 
> well. Since position delete files are being deprecated by Iceberg V3, in most 
> cases "SCAN deletes" will be empty, and IcebergDeleteNode only need to deal 
> with Deletion Vectors.
> DV evaluation could be pushed down to "SCAN datafiles with deletes". That 
> would be especially beneficial in the context of late materialization, in 
> which case we can skip materializing column values of inactive records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to