[
https://issues.apache.org/jira/browse/HADOOP-17015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093453#comment-17093453
]
Steve Loughran commented on HADOOP-17015:
-----------------------------------------
I'm going to point you at org.apache.hadoop.fs.s3a.S3ARetryPolicy as an example
of exception specific retry policies which is fairly declarative and easy to
extend -goes with org.apache.hadoop.fs.s3a.Invoker for wrapping operations and
subclassed by S3GuardExistsRetryPolicy and S3GuardDataAccessRetryPolicy for
retries on FileNotFoundExceptions when S3 is believed to be inconsistent.
There are some dangerous issues related to idempotency here, HDFS-4872
summarises the HDFS perspective.
S3A does consider delete to be idempotent, but that's explicitly declared in
the constant
{code}
public static final boolean DELETE_CONSIDERED_IDEMPOTENT = true;
{code}
so you can see where that assumption is being made.
The key issue is race conditions with other clients.
# you should really be looking on an operation-by-operation basis about what is
idempotent, and make that policy visible
# unless there's a way to tell the service to ignore resent requests (request
ID?), you either need to implement that idempotency or everyone agree that
certain race conditions (file created, deleted, second file created, delete
retried) can be lossy.
For rename, That policy sounds good. If there was a fetch of something (etag?)
which would be valid on the renamed object, then you'd know whether or not the
operation had already succeeded
> ABFS: Make PUT and POST operations idempotent
> ---------------------------------------------
>
> Key: HADOOP-17015
> URL: https://issues.apache.org/jira/browse/HADOOP-17015
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.2.1
> Reporter: Sneha Vijayarajan
> Assignee: Sneha Vijayarajan
> Priority: Major
> Fix For: 3.4.0
>
>
> Currently when a PUT or POST operation timeouts and the server has already
> successfully executed the operation, there is no check in driver to see if
> the operation did succeed or not and just retries the same operation again.
> This can cause driver to through invalid user errors.
>
> Sample scenario:
> # Rename request times out. Though server has successfully executed the
> operation.
> # Driver retries rename and get source not found error.
> In the scenario, driver needs to check if rename is being retried and success
> if source if not found, but destination is present.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]