[
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063053#comment-16063053
]
Steve Loughran commented on HADOOP-13761:
-----------------------------------------
I've been doing some closure-based wrapping of FS operations in the moved
{{WriteOperationsHelper}} of HADOOP-13786, in
[AwsLambda|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/AwsLambda.java].
I think I'd like to put more general AWS retry logic in place here: imagine
creating an instance of a hadoop retry policy which could be passed in...the
DDB calls would just have a different policy.
Linking to HADOOP-14012, which proposes TranslateException handling AWS
exceptions better. I concur, but it's broader than just DDB, as STS/KMS
exceptions get translated badly too ..we're mapping from HTTP status code to
File IO errors, when really alternate services may be indicating different
issues. There is enough detail inside the exceptions that more specific parsing
could handle, something like having a separate translator for AWS exceptions
for each specific service, giving them the ability to interpret them
themselves.
> S3Guard: implement retries
> ---------------------------
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Aaron Fabbri
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are
> needed, including:
> - open(path). If MetadataStore reflects recent create/move of file path, but
> we fail to read it from S3, retry.
> - delete(path). If deleteObject() on S3 fails, but MetadataStore shows the
> file exists, retry.
> - rename(src,dest). If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will
> create a separate JIRA for this as it will likely require interface changes
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing
> to make sure we're covered. Failure injection tests can be a separate JIRA
> to make this easier to review.
> We also need basic configuration parameters around retry policy. There
> should be a way to specify maximum retry duration, as some applications would
> prefer to receive an error eventually, than waiting indefinitely. We should
> also be keeping statistics when inconsistency is detected and we enter a
> retry loop.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]