[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

Kitti Nanasi (JIRA) Thu, 13 Dec 2018 12:24:21 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720572#comment-16720572
 ]


Kitti Nanasi commented on HDFS-14134:
-------------------------------------

The relevant part is the following:
{quote}in FailoverOnNetworkExceptionRetry#shouldRetry we don't fail-over and 
retry if we're making a non-idempotent call and there's an IOException or 
SocketException that's not Connect, NoRouteToHost, UnknownHost, or Standby. The 
rationale of course is that the operation may have reached the server and 
retrying elsewhere could leave us in an insconsistent state. This means if a 
client doing a create/delete which gets a SocketTimeoutException (which is an 
IOE) or an EOF SocketException the exception will be thrown all the way up to 
the caller of FileSystem/FileContext. That's reasonable because only the user 
of the API at this level has sufficient knoweldge of how to handle the failure, 
eg if they get such an exception after issuing a delete they can check if the 
file still exists and if so re-issue the delete (however they may also not want 
to do this, and FileContext doesn't know which).
{quote}

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-14134
>                 URL: https://issues.apache.org/jira/browse/HDFS-14134
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, hdfs-client, ipc
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>            Priority: Critical
>         Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

Reply via email to