[ 
https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086415#comment-13086415
 ] 

Eric Payne commented on HDFS-1257:
----------------------------------

Hi Nicholas. Thanks for your patience in getting through the reviews of this.

I'm confused as to why 1) you are seeing this error and 2) it is timing out for 
you. I'm not seeing that error in my environment. And, as for the timeout, even 
before when it was taking 3 minutes, it should not have timed out. There are a 
lot of unit tests that take longer than 3 minutes.

Anyway, as for taking it out, the reason for doing so would be that the test is 
not sufficient to thoroughly test the race condition. A unit test just can't 
stress the namenode in the MiniDFSCluster enough to exercise this race 
condition. To hit this race condition, a test must be in a large cluster with a 
very active set of DFS actions happening over an extended period of time. There 
just isn't enough memory on a single host to create enough DNs in the 
MiniDFSCluster. And, even if there were enoubh memory, a unit test should not 
be running for a very long period.


> Race condition on FSNamesystem#recentInvalidateSets introduced by HADOOP-5124
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-1257
>                 URL: https://issues.apache.org/jira/browse/HDFS-1257
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Ramkumar Vadali
>            Assignee: Eric Payne
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1257.1.20110810.patch, HDFS-1257.2.20110812.patch, 
> HDFS-1257.3.20110815.patch, HDFS-1257.4.20110816.patch, HDFS-1257.patch
>
>
> HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. 
> But it introduced unprotected access to the data structure 
> recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork 
> accesses recentInvalidateSets without read-lock protection. If there is 
> concurrent activity (like reducing replication on a file) that adds to 
> recentInvalidateSets, the name-node crashes with a 
> ConcurrentModificationException.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to