aokolnychyi commented on code in PR #6876:
URL: https://github.com/apache/iceberg/pull/6876#discussion_r1113624056
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -229,40 +215,23 @@ private void commitOperation(SnapshotUpdate<?> operation,
String description) {
private void abort(WriterCommitMessage[] messages) {
if (cleanupOnAbort) {
- Map<String, String> props = table.properties();
- Tasks.foreach(files(messages))
- .executeWith(ThreadPools.getWorkerPool())
- .retry(PropertyUtil.propertyAsInt(props, COMMIT_NUM_RETRIES,
COMMIT_NUM_RETRIES_DEFAULT))
- .exponentialBackoff(
- PropertyUtil.propertyAsInt(
- props, COMMIT_MIN_RETRY_WAIT_MS,
COMMIT_MIN_RETRY_WAIT_MS_DEFAULT),
- PropertyUtil.propertyAsInt(
- props, COMMIT_MAX_RETRY_WAIT_MS,
COMMIT_MAX_RETRY_WAIT_MS_DEFAULT),
- PropertyUtil.propertyAsInt(
- props, COMMIT_TOTAL_RETRY_TIME_MS,
COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT),
- 2.0 /* exponential */)
- .throwFailureWhenFinished()
- .run(
- file -> {
- table.io().deleteFile(file.path().toString());
- });
+ SparkCleanupUtil.deleteFiles("job abort", table.io(), files(messages));
} else {
- LOG.warn(
- "Skipping cleaning up of data files because Iceberg was unable to
determine the final commit state");
+ LOG.warn("Skipping cleanup of written files, unable to determine the
final commit state");
}
}
- private Iterable<DataFile> files(WriterCommitMessage[] messages) {
- if (messages.length > 0) {
- return Iterables.concat(
- Iterables.transform(
- Arrays.asList(messages),
- message ->
- message != null
- ? ImmutableList.copyOf(((TaskCommit) message).files())
- : ImmutableList.of()));
- }
- return ImmutableList.of();
+ private List<DataFile> files(WriterCommitMessage[] messages) {
Review Comment:
I changed the code to keep a list of files (shouldn't cost anything extra as
those files are already there) and switched to using `Lists.transform()`, which
is a lazy transform in `SparkCleanupUtil`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]