[jira] [Created] (HADOOP-15541) AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions

Sean Mackrory (JIRA) Thu, 14 Jun 2018 13:18:18 -0700

Sean Mackrory created HADOOP-15541:
--------------------------------------

             Summary: AWS SDK can mistake stream timeouts for EOF and throw 
SdkClientExceptions
                 Key: HADOOP-15541
                 URL: https://issues.apache.org/jira/browse/HADOOP-15541
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Sean Mackrory
            Assignee: Sean Mackrory



I've gotten a few reports of read timeouts not being handled properly in some 
Impala workloads. What happens is the following sequence of events (credit to 
Sailesh Mukil for figuring this out):
 * S3AInputStream.read() gets a SocketTimeoutException when it calls 
wrappedStream.read()
 * This is handled by onReadFailure -> reopen -> closeStream. When we try to 
drain the stream, SdkFilterInputStream.read() in the AWS SDK fails because of 
checkLength. The underlying Apache Commons stream returns -1 in the case of a 
timeout, and EOF.
 * The SDK assumes the -1 signifies an EOF, so assumes the bytes read must 
equal expected bytes, and because they don't (because it's a timeout and not an 
EOF) it throws an SdkClientException.

This is tricky to test for without a ton of mocking of AWS SDK internals, 
because you have to get into this conflicting state where the SDK has only read 
a subset of the expected bytes and gets a -1.

closeStream will abort the stream in the event of an IOException when draining. 
We could simply also abort in the event of an SdkClientException. I'm testing 
that this results in correct functionality in the workloads that seem to hit 
these timeouts a lot, but all the s3a tests continue to work with that change. 
I'm going to open an issue with the AWS SDK Github as well, but I'm not sure 
what the ideal outcome would be unless there's a good way to distinguish 
between a stream that has timed out and a stream that read all the data without 
huge rewrites.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HADOOP-15541) AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions

Reply via email to