[
https://issues.apache.org/jira/browse/HADOOP-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578775#comment-17578775
]
Viraj Jasani edited comment on HADOOP-15944 at 8/12/22 7:24 AM:
----------------------------------------------------------------
{quote}i think closing idle streams would be good, especially input streams.
{quote}
Sounds good, let me see if our current auditing is already able to capture
whether the current input streams are idle.
But I was wondering, if we really close the http input stream, how could we
support lazy seek kind of operations using S3AInputStream for instance? meaning
how do we even make an input stream eligible for idle stream, perhaps that's
where we need to do some digging?
was (Author: vjasani):
{quote}i think closing idle streams would be good, especially input streams.
{quote}
Sounds good, let me see if our current auditing is already able to capture
whether the current input streams are idle.
> S3AInputStream logging to make it easier to debug file leakage
> --------------------------------------------------------------
>
> Key: HADOOP-15944
> URL: https://issues.apache.org/jira/browse/HADOOP-15944
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.1.1
> Reporter: Steve Loughran
> Priority: Minor
>
> Problem: if an app opens too many input streams, then all the http
> connections in the S3A pool can be used up; all attempts to do other FS
> operations fail timing out for http pool access
> Proposed simple solution: log better what's going on with input stream
> lifecyce, specifically
> # include URL of file in open, reopen & close events
> # maybe: Separate logger for these events, though S3A Input stream should be
> enough as it doesn't do much else.
> # maybe: have some prefix in the events like "Lifecycle", so that you could
> use the existing log @ debug, grep for that phrase and look at the printed
> URLs to identify what's going on
> # stream metrics: expose some of the state of the http connection pool and/or
> active input and output streams
> Idle output streams don't use up http connections, as they only connect
> during block upload.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]