[
https://issues.apache.org/jira/browse/HADOOP-19330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898350#comment-17898350
]
ASF GitHub Bot commented on HADOOP-19330:
-----------------------------------------
steveloughran opened a new pull request, #7160:
URL: https://github.com/apache/hadoop/pull/7160
If a file is opened for reading through the S3A connector is not closed,
then when garbage collection takes place
* An error message is reported at WARN, including the file name.
* A stack trace of where the stream was created is reported at INFO.
* A best-effort attempt is made to release any active HTTPS connection.
* The filesystem IOStatistic stream_leaks is incremented.
The intent is to make it easier to identify where streams are being opened
and not closed -as these consume resources including often HTTPS connections
from the connection pool of limited size.
It MUST NOT be relied on as a way to clean up open files/streams
automatically; some of the normal actions of the close() method are omitted.
Instead: view the warning messages and IOStatistics as a sign of a problem,
the stack trace as a way of identifying what application code/library needs to
be investigated.
Contributed by Steve Loughran
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> S3AInputStream.finalizer to warn if closed with http connection -then release
> it
> --------------------------------------------------------------------------------
>
> Key: HADOOP-19330
> URL: https://issues.apache.org/jira/browse/HADOOP-19330
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> A recurring problem is that applications forget to close their input streams;
> eventually the HTTP connection runs out.
> Having the finalizer close streams during GC will ensure that after a GC the
> http connections are returned. While this is an improvement on today, it is
> insufficient
> * only happens during GC, so may not fix problem entirely
> * doesn't let developers know things are going wrong.
> * doesn't let us differentiate well between stream leak and overloaded FS
> proposed enhancements then
> * collect stack trace in constructor
> * log in finalize at warn including path, thread and stack
> * have special log for this, so it can be turned off in production (libraries
> telling end users off for developer errors is simply an annoyance)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]