[
https://issues.apache.org/jira/browse/IMPALA-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956188#comment-17956188
]
Riza Suminto commented on IMPALA-14075:
---------------------------------------
Filed https://gerrit.cloudera.org/c/22980/
> Parallelize delete operations of EXPIRE_SNAPSHOTS
> -------------------------------------------------
>
> Key: IMPALA-14075
> URL: https://issues.apache.org/jira/browse/IMPALA-14075
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Zoltán Borók-Nagy
> Assignee: Riza Suminto
> Priority: Major
> Labels: impala-iceberg
>
> Currently Impala executes EXPIRE_SNAPSHOTS operation on a single thread. It
> can be really slow on cloud storage systems, especially if the operation
> needs to remove lots of files.
> It is possible to run the delete operations in parallel by passing an
> ExecutorService object to ExpireSnapshots:
> {noformat}
> ExpireSnapshots executeDeleteWith(ExecutorService executorService);{noformat}
> [https://github.com/apache/iceberg/blob/31c315f695aad544a096a5a2ffdde54a97b90b28/api/src/main/java/org/apache/iceberg/ExpireSnapshots.java#L100]
> For reference, Hive uses 4 threads to execute the deletes:
> [https://github.com/apache/hive/blob/08067725bc6e8810579324736a0aac453c06bf7b/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2239-L2241]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]