Zoltán Borók-Nagy created IMPALA-14075:
------------------------------------------

             Summary: Parallelize delete operations of EXPIRE_SNAPSHOTS
                 Key: IMPALA-14075
                 URL: https://issues.apache.org/jira/browse/IMPALA-14075
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Zoltán Borók-Nagy


Currently Impala executes EXPIRE_SNAPSHOTS operation on a single thread. It can 
be really slow on cloud storage systems, especially if the operation needs to 
remove lots of files.

It is possible to run the delete operations in parallel by passing an 
ExecutorService object to ExpireSnapshots:
{noformat}
ExpireSnapshots executeDeleteWith(ExecutorService executorService);{noformat}
[https://github.com/apache/iceberg/blob/31c315f695aad544a096a5a2ffdde54a97b90b28/api/src/main/java/org/apache/iceberg/ExpireSnapshots.java#L100]

For reference, Hive uses 4 threads to execute the deletes:

[https://github.com/apache/hive/blob/08067725bc6e8810579324736a0aac453c06bf7b/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2239-L2241]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to