[ 
https://issues.apache.org/jira/browse/HDFS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002189#comment-14002189
 ] 

Yongjun Zhang commented on HDFS-6428:
-------------------------------------

Hi [~daryn] and [~cmccabe], 

Thanks for your comments. 

For Daryn's, I attempted to figure out exactly what else is modifying bpSlices 
by adding some debug printing, and the CME went away. The nature of the problem 
is the intermittency. I think Colin's analysis of what could have happened by 
looking at the source code makes sense. I did have samilar observation and 
actually had one attempt to add synchronized syntax to ALL methods that access 
bpSlices before posting the patch, it turned out the run was really slow, and 
it looked an overkill. So I ended up adding only the place that showed CME. 

I will take Colin's suggestion to add one to shutdownBlockPool, to see if it 
cause any significant run time increase, hopefully not because it's at shutting 
down stage. Still in theory we should synchronize all access. Thanks.



> TestWebHdfsWithMultipleNameNodes failed with ConcurrentModificationException
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-6428
>                 URL: https://issues.apache.org/jira/browse/HDFS-6428
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6428.001.patch
>
>
> TestWebHdfsWithMultipleNameNodes failed as follows:
> {code}
> Running org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.643 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes  Time elapsed: 
> 3.771 sec  <<< ERROR!
> java.util.ConcurrentModificationException: null
>         at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
>         at java.util.HashMap$EntryIterator.next(HashMap.java:934)
>         at java.util.HashMap$EntryIterator.next(HashMap.java:932)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:251)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:249)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1389)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1304)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1555)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1530)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1514)
>         at 
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:99)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to