danielcweeks commented on a change in pull request #1767:
URL: https://github.com/apache/iceberg/pull/1767#discussion_r524452128
##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -87,18 +168,105 @@ public void close() throws IOException {
super.close();
closed = true;
+ currentStagingFile = null;
try {
stream.close();
- s3.putObject(
-
PutObjectRequest.builder().bucket(location.bucket()).key(location.key()).build(),
- RequestBody.fromFile(stagingFile));
+ completeUploads();
} finally {
- if (!stagingFile.delete()) {
- LOG.warn("Could not delete temporary file: {}", stagingFile);
+ stagingFiles.forEach(f -> {
Review comment:
@rdblue looking at `Tasks.foreach`, I'm not sure how well it fits in
this case. It looks like `Tasks` assumes that you know the full units of work
and will submit them in parallel and block. In this case, we are progressively
adding more work and then need to block across all of those at the end to
complete the multipart upload. Now, I could wrap the `Tasks.foreach` call in
future, but that seems like we're just wrapping one async framework with
another.
I think we can handle the concerns of deleting the files, aborting the
upload, and logging with `CompletableFuture` framework and retries should be
left to the S3Client (trying to avoid retries on top of retries).
Thoughts?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]