[ https://issues.apache.org/jira/browse/HDFS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002189#comment-14002189 ]
Yongjun Zhang commented on HDFS-6428: ------------------------------------- Hi [~daryn] and [~cmccabe], Thanks for your comments. For Daryn's, I attempted to figure out exactly what else is modifying bpSlices by adding some debug printing, and the CME went away. The nature of the problem is the intermittency. I think Colin's analysis of what could have happened by looking at the source code makes sense. I did have samilar observation and actually had one attempt to add synchronized syntax to ALL methods that access bpSlices before posting the patch, it turned out the run was really slow, and it looked an overkill. So I ended up adding only the place that showed CME. I will take Colin's suggestion to add one to shutdownBlockPool, to see if it cause any significant run time increase, hopefully not because it's at shutting down stage. Still in theory we should synchronize all access. Thanks. > TestWebHdfsWithMultipleNameNodes failed with ConcurrentModificationException > ---------------------------------------------------------------------------- > > Key: HDFS-6428 > URL: https://issues.apache.org/jira/browse/HDFS-6428 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-6428.001.patch > > > TestWebHdfsWithMultipleNameNodes failed as follows: > {code} > Running org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes > Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.643 sec <<< > FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes > org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes Time elapsed: > 3.771 sec <<< ERROR! > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894) > at java.util.HashMap$EntryIterator.next(HashMap.java:934) > at java.util.HashMap$EntryIterator.next(HashMap.java:932) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:251) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:249) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1389) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1304) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1555) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1530) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1514) > at > org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:99) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)