[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369187#comment-16369187
 ] 

Steve Loughran commented on HADOOP-13761:
-----------------------------------------

bq. Or just call them Retry with a clarifying comment–since they "are covered 
WRT retries that we actually want".  I'm thinking the latter.

+1

# I believe throttling can reject any new HTTP verb issued over an open 
HTTP/1.1 channel. That is: every request to a shard is checked against that 
shard's tracking of an AWS account's load & throttled if need be
# same for requests of AWS KMS
# things like bandwidth throttling are more likely to happen between EC2 VM and 
the store
# load balancers can get overloaded. AWS recommend not caching the DNS entries 
for very long.

One failure mode to consider is delete + create inconsistency. The retry logic 
in this patch will catch the situation of PUT, DELETE, PUT, GET, as if the 
second PUT hasn't trickled through, it will spin. I will not handle the 
situation where the GET still gets back the initial PUT. We don't have any 
explicit tests for that situation, and if we did, not much which could be done.





> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-13761
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13761
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>            Priority: Blocker
>         Attachments: HADOOP-13761-004-to-005.patch, 
> HADOOP-13761-005-to-006-approx.diff.txt, HADOOP-13761-005.patch, 
> HADOOP-13761-006.patch, HADOOP-13761.001.patch, HADOOP-13761.002.patch, 
> HADOOP-13761.003.patch, HADOOP-13761.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to