[
https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006755#comment-16006755
]
Steve Loughran commented on HADOOP-14303:
-----------------------------------------
I'm thinking if we target Java 8+, we could do this with closures nicely,
similar to how we now have {{intercept()}} and {{eventually()}} in
LambdaTestUtils; some method which would take an IOE-raising closure and a
retry policy, and repeat the closure until the retry policy gave up. On
success: return the result of the operation
{code}
MultipartUpload mpu = execute(new RetryAllButAuth(), () ->
initiateMultipartUpload())
{code}
This would really keep those try/catch/while loops under control, and make it a
lot easier to use in what is becoming a fairly complex piece of code.
> Review retry logic on all S3 SDK calls, implement where needed
> --------------------------------------------------------------
>
> Key: HADOOP-14303
> URL: https://issues.apache.org/jira/browse/HADOOP-14303
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
>
> AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle
> this without failing, as if it slows down its requests it can recover.
> 1. Look at all the places where we are calling S3A via the AWS SDK and make
> sure we are retrying with some backoff & jitter policy, ideally something
> unified. This must be more systematic than the case-by-case,
> problem-by-problem strategy we are implicitly using.
> 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT),
> but we need to check the other parts of the process: login, initiate/complete
> MPU, ...
> Related
> HADOOP-13811 Failed to sanitize XML document destined for handler class
> HADOOP-13664 S3AInputStream to use a retry policy on read failures
> This stuff is all hard to test. A key need is to be able to differentiate
> recoverable throttle & network failures from unrecoverable problems like:
> auth, network config (e.g bad endpoint), etc.
> May be the opportunity to add a faulting subclass of Amazon S3 client which
> can be configured in IT Tests to fail at specific points. Ryan Blue's mcok S3
> client does this in HADOOP-13786, but it is for 100% mock. I'm thinking of
> something with similar fault raising, but in front of the real S3A client
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]