kumarpritam863 commented on code in PR #15210:
URL: https://github.com/apache/iceberg/pull/15210#discussion_r2752408825
##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java:
##########
@@ -407,6 +407,9 @@ private void cleanUpStagingFiles() {
.suppressFailureWhenFinished()
.onFailure((file, thrown) -> LOG.warn("Failed to delete staging file:
{}", file, thrown))
.run(File::delete);
+ // clear staging files and multipart map
+ stagingFiles.clear();
+ multiPartMap.clear();
Review Comment:
Thanks @singhpk234 for the review.
Regarding memory management:
While the staging files list will eventually allow objects to be
garbage-collected once they go out of scope, I’m concerned that retaining
strong references to many FileAndDigest objects (especially in upload-heavy /
long-running workloads) can still cause practical issues:
- Increased heap pressure during periods of high concurrent or sequential
uploads
- Longer object lifetime → more frequent / longer GC pauses
- Higher risk of OutOfMemoryError during peak load (I’ve sometimes observed
OOMs in similar scenarios when large numbers of parts accumulate without
cleanup while running Iceberg-Kafka-Connect)
Even though the theoretical lifetime is finite, the practical memory
pressure and GC overhead seem non-negligible in our use case.
Also although it does not effect the AWS multipart upload as AWS requires
the part number to be unique but starting the part number from 1 and keeping it
in low bounds make managing CompleteMultipartUpload requests easier. Currently
the part number comes from the Index() of the part-file from staging files list
which can start from a higher number if the previous files are not cleared.
Please let me know your thoughts on these.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]