wombatu-kun opened a new pull request, #16739:
URL: https://github.com/apache/iceberg/pull/16739

   `ResolvingFileIO` (the default FileIO for most engines) sets 
`init-creation-stacktrace=false` on the FileIO it delegates to, so the 
relatively expensive creation-stack capture is done once by `ResolvingFileIO` 
rather than repeatedly by the delegate (see the comment at `ResolvingFileIO`). 
`S3FileIO` honors that flag, but `S3InputStream` and `S3OutputStream` ignored 
it and always called `Thread.currentThread().getStackTrace()` in their 
constructors. That stack is only used to print a creation-site warning if a 
stream is garbage collected without being closed, yet it was captured on every 
stream creation regardless of the flag.
   
   This change makes both streams honor the flag through a new 
`S3FileIOProperties.isCreateStackTraceEnabled()` (default `true`, so default 
behavior and the leak diagnostics are unchanged). The 
`init-creation-stacktrace` key is promoted to a constant on 
`S3FileIOProperties` and reused by `S3FileIO` instead of the previous string 
literal. When capture is disabled and a stream is still leaked, the warning is 
still logged, just without the creation stack.
   
   Because streams are created per file opened or written, while `S3FileIO` is 
created roughly once per FileIO instance, the streams are the more frequent 
capture site. On the default `ResolvingFileIO` path the flag is already 
`false`, so this overhead is now actually avoided there.
   
   ## Benchmark
   
   A JMH benchmark that constructs an `S3InputStream` from representative 
call-stack depths, with creation-stack capture enabled vs disabled (`-prof gc`, 
JDK 21, 1 fork, 5+5 iterations). The benchmark is not included in this PR. The 
metric is allocation per stream construction (i.e. roughly per file); capture 
allocates one `StackTraceElement`, with several `String` fields, per stack 
frame.
   
   | Stack depth | capture | alloc B/op |
   |---|---|---|
   | 16 | enabled (current) | 39,934 |
   | 16 | disabled | 607 |
   | 64 | enabled (current) | 91,769 |
   | 64 | disabled | 656 |
   
   That is a 98-99% reduction in per-stream allocation when the flag is 
disabled, scaling with call-stack depth. For example, a query opening 10,000 
files at depth 64 avoids roughly 900 MB of short-lived garbage that was only 
ever used for an unclosed-stream warning. Construction time also trends lower 
but is noisy at this granularity, so the allocation reduction is the reliable 
measure.
   
   `TestS3FileIOProperties` is extended to cover the new property (default 
value and round-trip); existing `S3InputStream` tests continue to pass.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to