[
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015940#comment-14015940
]
Hudson commented on HADOOP-10630:
---------------------------------
SUCCESS: Integrated in Hadoop-trunk-Commit #5644 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/5644/])
HADOOP-10630. Possible race condition in RetryInvocationHandler. Contributed by
Jing Zhao. (jing9:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1599366)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
*
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
> Possible race condition in RetryInvocationHandler
> -------------------------------------------------
>
> Key: HADOOP-10630
> URL: https://issues.apache.org/jira/browse/HADOOP-10630
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Fix For: 2.5.0
>
> Attachments: HADOOP-10630.000.patch
>
>
> In one of our system tests with NameNode HA setup, we ran 300 threads in
> LoadGenerator. While one of the NameNodes was already in the active state and
> started to serve, we still saw one of the client thread failed all the
> retries in a 20 seconds window. In the meanwhile, we saw a lot of following
> warning msg in the log:
> {noformat}
> WARN retry.RetryInvocationHandler: A failover has occurred since the start of
> this method invocation attempt.
> {noformat}
> After checking the code, we see the following code in RetryInvocationHandler:
> {code}
> while (true) {
> // The number of times this invocation handler has ever been failed
> over,
> // before this method invocation attempt. Used to prevent concurrent
> // failed method invocations from triggering multiple failover attempts.
> long invocationAttemptFailoverCount;
> synchronized (proxyProvider) {
> invocationAttemptFailoverCount = proxyProviderFailoverCount;
> }
> ......
> if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
> // Make sure that concurrent failed method invocations only cause
> a
> // single actual fail over.
> synchronized (proxyProvider) {
> if (invocationAttemptFailoverCount ==
> proxyProviderFailoverCount) {
> proxyProvider.performFailover(currentProxy.proxy);
> proxyProviderFailoverCount++;
> currentProxy = proxyProvider.getProxy();
> } else {
> LOG.warn("A failover has occurred since the start of this
> method"
> + " invocation attempt.");
> }
> }
> invocationFailoverCount++;
> }
> ......
> {code}
> We can see we refresh the value of currentProxy only when the thread performs
> the failover (while holding the monitor of the proxyProvider). Because
> "currentProxy" is not volatile, a thread that does not perform the failover
> (in which case it will log the warning msg) may fail to get the new value of
> currentProxy.
--
This message was sent by Atlassian JIRA
(v6.2#6252)