[ 
https://issues.apache.org/jira/browse/HDFS-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5289:
---------------------------------

    Attachment: HDFS-5289.patch

Here's a patch which addresses the issue by catching the exception and retrying 
the check in this case, instead of assuming that the check will return null. I 
tested this by adding a {{Thread.sleep(1000)}} in 
{{TestRetryCacheWithHA#testClientRetryWithFailover}} in the thread to execute 
the operation. The test fails reliably without this patch, and passes reliably 
with it.

> Race condition in TestRetryCacheWithHA#testCreateSymlink causes spurious test 
> failure
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-5289
>                 URL: https://issues.apache.org/jira/browse/HDFS-5289
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.1.1-beta
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-5289.patch
>
>
> The code to check if the operation has been completed on the active NN can 
> potentially execute before the thread actually doing the operation has run. 
> In this case the checking code will retry the check if the result of the 
> check is null. However, the test operation does not in fact return null, 
> instead throwing an exception if the file doesn't exist yet. We need to catch 
> the exception and retry.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to