[ 
https://issues.apache.org/jira/browse/HIVE-27850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783580#comment-17783580
 ] 

Peter Vary commented on HIVE-27850:
-----------------------------------

Hi [~difin],

Thanks for reaching out!

Indeed, I am planning to work on Iceberg compaction on Flink jobs, so it is 
somewhat related. Our use-case is different, as we do not have an option to 
rewrite the whole table. Our goal is to do an incremental compaction of the 
freshly arrived new files, and maybe convert the equality delete files to 
positional delete files for easier read on merge operations. The full table 
rewrite would come as a side benefit, but the main goal would be to provide a 
less resource intensive compaction for the new (never before compacted) files.

I was thinking that maybe Hive would also benefit from refactoring out Spark 
related compaction code to some generic place, where Spark, Flink and Hive 
could reuse the compaction features already written by the Iceberg-Spark team.

Thanks,

Peter

> Compaction for Iceberg tables
> -----------------------------
>
>                 Key: HIVE-27850
>                 URL: https://issues.apache.org/jira/browse/HIVE-27850
>             Project: Hive
>          Issue Type: New Feature
>          Components: Iceberg integration
>            Reporter: Dmitriy Fingerman
>            Assignee: Dmitriy Fingerman
>            Priority: Major
>              Labels: pull-request-available
>
> Hive currently doesn't have the table compaction functionality. It would be 
> highly beneficial for performance to implement this feature because this 
> would create larger data files and eliminate positional delete files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to