Zoltán Borók-Nagy created IMPALA-14075:
------------------------------------------
Summary: Parallelize delete operations of EXPIRE_SNAPSHOTS
Key: IMPALA-14075
URL: https://issues.apache.org/jira/browse/IMPALA-14075
Project: IMPALA
Issue Type: Improvement
Reporter: Zoltán Borók-Nagy
Currently Impala executes EXPIRE_SNAPSHOTS operation on a single thread. It can
be really slow on cloud storage systems, especially if the operation needs to
remove lots of files.
It is possible to run the delete operations in parallel by passing an
ExecutorService object to ExpireSnapshots:
{noformat}
ExpireSnapshots executeDeleteWith(ExecutorService executorService);{noformat}
[https://github.com/apache/iceberg/blob/31c315f695aad544a096a5a2ffdde54a97b90b28/api/src/main/java/org/apache/iceberg/ExpireSnapshots.java#L100]
For reference, Hive uses 4 threads to execute the deletes:
[https://github.com/apache/hive/blob/08067725bc6e8810579324736a0aac453c06bf7b/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2239-L2241]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]