[
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720572#comment-16720572
]
Kitti Nanasi commented on HDFS-14134:
-------------------------------------
The relevant part is the following:
{quote}in FailoverOnNetworkExceptionRetry#shouldRetry we don't fail-over and
retry if we're making a non-idempotent call and there's an IOException or
SocketException that's not Connect, NoRouteToHost, UnknownHost, or Standby. The
rationale of course is that the operation may have reached the server and
retrying elsewhere could leave us in an insconsistent state. This means if a
client doing a create/delete which gets a SocketTimeoutException (which is an
IOE) or an EOF SocketException the exception will be thrown all the way up to
the caller of FileSystem/FileContext. That's reasonable because only the user
of the API at this level has sufficient knoweldge of how to handle the failure,
eg if they get such an exception after issuing a delete they can check if the
file still exists and if so re-issue the delete (however they may also not want
to do this, and FileContext doesn't know which).
{quote}
> Idempotent operations throwing RemoteException should not be retried by the
> client
> ----------------------------------------------------------------------------------
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs, hdfs-client, ipc
> Reporter: Lukas Majercak
> Assignee: Lukas Majercak
> Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch,
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch,
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file
> does not have the attribute, NN throws an IOException with message "could not
> find attr". The current client retry policy determines the action for that to
> be FAILOVER_AND_RETRY. The client then fails over and retries until it
> reaches the maximum number of retries. Supposedly, the client should be able
> to tell that this exception is normal and fail fast.
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes
> precedence over FAIL action.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]