[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828886#comment-16828886 ]
Yuxuan Wang commented on HDFS-14134: ------------------------------------ Hello, anyone is working on this? I find a bug in {{org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider}} . Just like [~lukmajercak] said: {quote}Also note that previously, if a hedging request got FAILOVER_RETRY and some request got SocketExc on nonidempotent operation (e.g. FAIL), the client would still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue here as well. {quote} But more than this, standby namenode will always throw back StandbyException which can cause {{FAILOVER_AND_RETRY}} action. It will cover all actions have lower order than {{FAILOVER_AND_RETRY}}, such as {{RETRY}} in [^HDFS-14134.007.patch]. I mean, the correct order should be {{Ordering: FAILOVER_AND_RETRY < RETRY < FAIL}}, right ? > Idempotent operations throwing RemoteException should not be retried by the > client > ---------------------------------------------------------------------------------- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc > Reporter: Lukas Majercak > Assignee: Lukas Majercak > Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org