[ 
https://issues.apache.org/jira/browse/IMPALA-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11484.
----------------------------------------
    Fix Version/s: Impala 4.2.0
       Resolution: Fixed

> Create SCAN plan for Iceberg V2 position delete tables
> ------------------------------------------------------
>
>                 Key: IMPALA-11484
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11484
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.2.0
>
>
> Iceberg position delete files store the full URIs and and file positions of 
> rows that are deleted. Therefore we can do an ANTI HASH JOIN between data 
> files and delete files to retrieve only the active rows.
> For the data file rows we need to get the virtual columns INPUT_FILE_NAME and 
> FILE_POSITION, while in the delete files we need to retrieve the columns 
> 'file_path' and 'pos': https://iceberg.apache.org/spec/#position-delete-files
> Since the data files are in table schema, and the delete files are in a 
> different schema, we need to create a virtual table for the delete files with 
> the corresponding schema.
> Iceberg tells us which delete files must be applied to which data files, i.e. 
> if a data file doesn't have a corresponding delete file, the content can be 
> just UNION'ed with the output of the ANTI HASH JOIN.
> See more information in the design doc: 
> https://docs.google.com/document/d/1WF_UOanQ61RUuQlM4LaiRWI0YXpPKZ2VEJ8gyJdDyoY/edit#heading=h.5gc49pcc2543



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to