wombatu-kun opened a new pull request, #16739: URL: https://github.com/apache/iceberg/pull/16739
`ResolvingFileIO` (the default FileIO for most engines) sets `init-creation-stacktrace=false` on the FileIO it delegates to, so the relatively expensive creation-stack capture is done once by `ResolvingFileIO` rather than repeatedly by the delegate (see the comment at `ResolvingFileIO`). `S3FileIO` honors that flag, but `S3InputStream` and `S3OutputStream` ignored it and always called `Thread.currentThread().getStackTrace()` in their constructors. That stack is only used to print a creation-site warning if a stream is garbage collected without being closed, yet it was captured on every stream creation regardless of the flag. This change makes both streams honor the flag through a new `S3FileIOProperties.isCreateStackTraceEnabled()` (default `true`, so default behavior and the leak diagnostics are unchanged). The `init-creation-stacktrace` key is promoted to a constant on `S3FileIOProperties` and reused by `S3FileIO` instead of the previous string literal. When capture is disabled and a stream is still leaked, the warning is still logged, just without the creation stack. Because streams are created per file opened or written, while `S3FileIO` is created roughly once per FileIO instance, the streams are the more frequent capture site. On the default `ResolvingFileIO` path the flag is already `false`, so this overhead is now actually avoided there. ## Benchmark A JMH benchmark that constructs an `S3InputStream` from representative call-stack depths, with creation-stack capture enabled vs disabled (`-prof gc`, JDK 21, 1 fork, 5+5 iterations). The benchmark is not included in this PR. The metric is allocation per stream construction (i.e. roughly per file); capture allocates one `StackTraceElement`, with several `String` fields, per stack frame. | Stack depth | capture | alloc B/op | |---|---|---| | 16 | enabled (current) | 39,934 | | 16 | disabled | 607 | | 64 | enabled (current) | 91,769 | | 64 | disabled | 656 | That is a 98-99% reduction in per-stream allocation when the flag is disabled, scaling with call-stack depth. For example, a query opening 10,000 files at depth 64 avoids roughly 900 MB of short-lived garbage that was only ever used for an unclosed-stream warning. Construction time also trends lower but is noisy at this granularity, so the allocation reduction is the reliable measure. `TestS3FileIOProperties` is extended to cover the new property (default value and round-trip); existing `S3InputStream` tests continue to pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
