danielcweeks commented on a change in pull request #1767:
URL: https://github.com/apache/iceberg/pull/1767#discussion_r524452128



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -87,18 +168,105 @@ public void close() throws IOException {
 
     super.close();
     closed = true;
+    currentStagingFile = null;
 
     try {
       stream.close();
 
-      s3.putObject(
-          
PutObjectRequest.builder().bucket(location.bucket()).key(location.key()).build(),
-          RequestBody.fromFile(stagingFile));
+      completeUploads();
     } finally {
-      if (!stagingFile.delete()) {
-        LOG.warn("Could not delete temporary file: {}", stagingFile);
+      stagingFiles.forEach(f -> {

Review comment:
       @rdblue looking at `Tasks.foreach`, I'm not sure how well it fits in 
this case.  It looks like `Tasks` assumes that you know the full units of work 
and will submit them in parallel and block.  In this case, we are progressively 
adding more work and then need to block across all of those at the end to 
complete the multipart upload.  Now, I could wrap the `Tasks.foreach` call in 
future, but that seems like we're just wrapping one async framework with 
another.
   
   I think we can handle the concerns of deleting the files, aborting the 
upload, and logging with `CompletableFuture` framework and retries should be 
left to the S3Client (trying to avoid retries on top of retries).
   
   Thoughts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to