[ 
https://issues.apache.org/jira/browse/IMPALA-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080056#comment-18080056
 ] 

Noémi Pap-Takács commented on IMPALA-14908:
-------------------------------------------

OPTIMIZE removes applicable equality delete files but it cannot exclude 
dangling delete files. I don't think it should be in the scope of OPTIMIZE as 
going through all manifest entries to detect those files could be very 
expensive.
It could be a separate maintenance feature if needed. Currently, Spark's 
[rewrite_data_files 
action|https://iceberg.apache.org/docs/latest/spark-procedures/#general-options:~:text=Remove%20dangling%20position%20and%20equality%20deletes%20after%20rewriting.%20A%20delete%20file%20is%20considered%20dangling%20if%20it%20does%20not%20apply%20to%20any%20live%20data%20files.%20Enabling%20this%20will%20generate%20an%20additional%20commit%20for%20the%20removal]
 can do the trick.

> OPTIMIZE statement leaves equality-delete files in metadata
> -----------------------------------------------------------
>
>                 Key: IMPALA-14908
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14908
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Peter Rozsa
>            Assignee: Noémi Pap-Takács
>            Priority: Major
>
> OPTIMIZE uses planFiles to collect all data files with associated deletes 
> during the catalog finalization phase. Iceberg's planFiles applies 
> column-range statistics to prune equality-delete files from scan tasks - if a 
> delete file's target value does not overlap with a data file's column bounds, 
> it is excluded from that file's FileScanTask.deletes(). As a result, the 
> rewrite operation never sees those equality-delete files, and they are not 
> passed to rewrite.deleteFile(). The new snapshot therefore still contains the 
> equality-delete files after OPTIMIZE completes.
> Steps to reproduce (rollback required after execution):
> OPTIMIZE TABLE functional_parquet.iceberg_v2_delete_equality;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to