[
https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055606#comment-13055606
]
Eric Payne commented on HDFS-1257:
----------------------------------
What is the status of this Jira?
I believe that I am also running into this issue. I am using the yahoo_merge
branch, but it should be the same in all branches.
When running stress tests, the NameNode daemon receives a
ConcurrentModificationException and exits during certain race conditions.
This seems to be a fairly critical bug that could cause the NameNode to exit
under stress conditions.
The node configuration I am using is running a single indepent namenode on one
machine and hundreds of simulated (by MiniDFSCluster) datanodes on each of 9
other machines, for a total of up to 2000 simulated datanodes.
Than, in this environment, the DataNodeGenerator test is run, which does random
reads, creates, writes, and deletes. The goal is to stress the NameNode with
hundreds of operations per second.
In some race conditions, when ReplicationMonitor() is calculating invalid
blocks, the recentInvalidateSets TreeMap within BlockManager is being modified
by one thread while the ReplicationMonitor() is iterating over it.
Here is the exception and stack traceback:
2011-06-08 15:33:41,551 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: ReplicationMonitor thread
received Runtime exception.
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at java.util.AbstractCollection.toArray(AbstractCollection.java:124)
at java.util.ArrayList.<init>(ArrayList.java:131)
at
org.apache.hadoop.hdfs.server.namenode.BlockManager.computeInvalidateWork(BlockManager.java:682)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2978)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2925)
at java.lang.Thread.run(Thread.java:619)
One thing I did try was to go into the BlockManager and put 'synchronized()'
around all places that iterate over, add to, or remove from the
recentInvalidateSets TreeMap variable.
I'm not sure what performance (or other unforseen) ramifications this may have.
However, I was able to eliminate the ConcurrentModificationException by using
this fix, at least in my test
environment.
> Race condition introduced by HADOOP-5124
> ----------------------------------------
>
> Key: HDFS-1257
> URL: https://issues.apache.org/jira/browse/HDFS-1257
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Reporter: Ramkumar Vadali
> Attachments: HDFS-1257.patch
>
>
> HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets.
> But it introduced unprotected access to the data structure
> recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork
> accesses recentInvalidateSets without read-lock protection. If there is
> concurrent activity (like reducing replication on a file) that adds to
> recentInvalidateSets, the name-node crashes with a
> ConcurrentModificationException.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira