kumarpritam863 commented on code in PR #15210:
URL: https://github.com/apache/iceberg/pull/15210#discussion_r2752408825


##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java:
##########
@@ -407,6 +407,9 @@ private void cleanUpStagingFiles() {
         .suppressFailureWhenFinished()
         .onFailure((file, thrown) -> LOG.warn("Failed to delete staging file: 
{}", file, thrown))
         .run(File::delete);
+    // clear staging files and multipart map
+    stagingFiles.clear();
+    multiPartMap.clear();

Review Comment:
   Thanks @singhpk234 for the review. 
   
   Regarding memory management:
   While the staging files list will eventually allow objects to be 
garbage-collected once they go out of scope, I’m concerned that retaining 
strong references to many FileAndDigest objects (especially in upload-heavy / 
long-running workloads) can still cause practical issues:
   - Increased heap pressure during periods of high concurrent or sequential 
uploads
   - Longer object lifetime → more frequent / longer GC pauses
   - Higher risk of OutOfMemoryError during peak load (I’ve sometimes observed 
OOMs in similar scenarios when large numbers of parts accumulate without 
cleanup while running Iceberg-Kafka-Connect)
   
   Even though the theoretical lifetime is finite, the practical memory 
pressure and GC overhead seem non-negligible in our use case.
   
   Also although it does not effect the AWS multipart upload as AWS requires 
the part number to be unique but starting the part number from 1 and keeping it 
in low bounds make managing CompleteMultipartUpload requests easier. Currently 
the part number comes from the Index() of the part-file from staging files list 
which can start from a higher number if the previous files are not cleared.
   
   Please let me know your thoughts on these.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to