[ https://issues.apache.org/jira/browse/HADOOP-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357390#comment-16357390 ]
Steve Loughran commented on HADOOP-15216: ----------------------------------------- HADOOP-13761 covers the condition where S3Guard finds the file in its geFileStatus in the {{FileSystem.open()}} call, but when S3AInputStream initiates the GET a 404 comes back: FNFE should be handled with backoff too * Maybe: special handling for that first attempt, as an FNFE on later ones probably means someone deleted the file * The situation of HEAD -> 200, GET -> 400 could also arise if the GET went to a different shard from the HEAD. So the condition could also arise in non-S3guarded buckets, sometimes > S3AInputStream to handle reconnect on read() failure better > ----------------------------------------------------------- > > Key: HADOOP-15216 > URL: https://issues.apache.org/jira/browse/HADOOP-15216 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0 > Reporter: Steve Loughran > Priority: Major > > {{S3AInputStream}} handles any IOE through a close() of stream and single > re-invocation of the read, with > * no backoff > * no abort of the HTTPS connection, which is just returned to the pool, If > httpclient hasn't noticed the failure, it may get returned to the caller on > the next read > Proposed > * switch to invoker > * retry policy explicitly for stream (EOF => throw, timeout => close, sleep, > retry, etc) > We could think about extending the fault injection to inject stream read > failures intermittently too, though it would need something in S3AInputStream > to (optionally) wrap the http input streams with the failing stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org