[ 
https://issues.apache.org/jira/browse/HADOOP-17812?focusedWorklogId=627300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-627300
 ]

ASF GitHub Bot logged work on HADOOP-17812:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Jul/21 19:50
            Start Date: 23/Jul/21 19:50
    Worklog Time Spent: 10m 
      Work Description: steveloughran edited a comment on pull request #3222:
URL: https://github.com/apache/hadoop/pull/3222#issuecomment-885641933


   *update 2021-07-23-20:49* actually, I may be wrong here. will need to write 
the test to see what the outcome is. still be best to preserve whatever did 
fail though
   
   > when wrappedStream is null, the IOException is thrown, then the catch 
block will call onReadFailure to retry.
   
   yes, but the exception raised is an IOE, *not the underlying cause*, so the 
retry logic won't examine failure, it will simply give up.
   
   If you look at the 
[S3ARetryPolicy](https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java#L176)
 you can see how no attempt is made to retry a generic IOE. Therefore there 
will be precisely one retry attempt (the exception handler), and if that 
doesn't fix it (e.g. server not yet recovered): Failure.
   
   >  for the code suggestion you gave is the same as the PR
   
   I am proposing that on the entry to method, the full attempt to reconnect is 
made if to the stream is null
   
   In the test which this PR is *going to need*, the issue will become apparent 
if the simulated failure is a sequence of
   
   1. succeed, returning a stream
   2. throw SocketTimeoutException on the first read()
   3. throw ConnectTimeoutException three times
   3. then return a stream whose read() returns a character
   
   With ConnectTimeoutException being raised on the reconnect, the retry will 
try to connect with backoff, jitter, configurable limit. Throwing a simple IOE 
will fail on the first retry
   
   (test case should also setup a retry policy with a retry interval of 0ms so 
it doesn't trigger any delays)
   
   +@majdyz


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 627300)
    Time Spent: 1.5h  (was: 1h 20m)

> NPE in S3AInputStream read() after failure to reconnect to store
> ----------------------------------------------------------------
>
>                 Key: HADOOP-17812
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17812
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.2, 3.3.1
>            Reporter: Bobby Wang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> when [reading from S3a 
> storage|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L450],
>  SSLException (which extends IOException) happens, which will trigger 
> [onReadFailure|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L458].
> onReadFailure calls "reopen". it will first close the original 
> *wrappedStream* and set *wrappedStream = null*, and then it will try to 
> [re-get 
> *wrappedStream*|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L184].
>  But what if the previous code [obtaining 
> S3Object|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L183]
>  throw exception, then "wrappedStream" will be null.
> And the 
> [retry|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L446]
>  mechanism may re-execute the 
> [wrappedStream.read|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L450]
>  and cause NPE.
>  
> For more details, please refer to 
> [https://github.com/NVIDIA/spark-rapids/issues/2915]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to