[
https://issues.apache.org/jira/browse/IMPALA-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy resolved IMPALA-11484.
----------------------------------------
Fix Version/s: Impala 4.2.0
Resolution: Fixed
> Create SCAN plan for Iceberg V2 position delete tables
> ------------------------------------------------------
>
> Key: IMPALA-11484
> URL: https://issues.apache.org/jira/browse/IMPALA-11484
> Project: IMPALA
> Issue Type: New Feature
> Components: Frontend
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.2.0
>
>
> Iceberg position delete files store the full URIs and and file positions of
> rows that are deleted. Therefore we can do an ANTI HASH JOIN between data
> files and delete files to retrieve only the active rows.
> For the data file rows we need to get the virtual columns INPUT_FILE_NAME and
> FILE_POSITION, while in the delete files we need to retrieve the columns
> 'file_path' and 'pos': https://iceberg.apache.org/spec/#position-delete-files
> Since the data files are in table schema, and the delete files are in a
> different schema, we need to create a virtual table for the delete files with
> the corresponding schema.
> Iceberg tells us which delete files must be applied to which data files, i.e.
> if a data file doesn't have a corresponding delete file, the content can be
> just UNION'ed with the output of the ANTI HASH JOIN.
> See more information in the design doc:
> https://docs.google.com/document/d/1WF_UOanQ61RUuQlM4LaiRWI0YXpPKZ2VEJ8gyJdDyoY/edit#heading=h.5gc49pcc2543
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]