[ 
https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3616:
----------------------------

    Attachment: HDFS-3616.trunk.001.patch

After checking the code, I guess the exception is caused by this process:

1. In DataNode#shutdown(), DataNode#shouldRun is set to false.

2. BPServiceActor#run() stops running, and runs BPServiceActor#cleanUp().

3. While executing BPServiceActor#cleanUp(), DataNode#shutdownBlockPool() is 
called, where blockPoolManager.remove(bpos) is executed before 
"this.blockPoolManager.shutDownAll();" is called in DataNode#shutdown(). Thus 
the corresponding BPOfferService cannot be seen and shutdown by 
blockPoolManager#shutDownAll() since it has been removed from 
BlockPoolManager#offerServices.

4. The actor thread continues running DataNode#shutdownBlockPool() which will 
finally tries to remove record from FsVolumeImpl#bpSlices, while the DataNode 
shutdown thread runs into FsVolumeImpl#shutdown() which iterates the bpSlices. 
Thus the ConcurrentModificationException may be thrown.

So to avoid changing other code, maybe we can simply change bpSlices from 
HashMap to ConcurrentHashMap? A simple patch based on this is attached.
                
> TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException 
> in DN shutdown
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3616
>                 URL: https://issues.apache.org/jira/browse/HDFS-3616
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Jing Zhao
>         Attachments: HDFS-3616.trunk.001.patch
>
>
> I have seen this in precommit build #2743
> {noformat}
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$EntryIterator.next(HashMap.java:834)
>       at java.util.HashMap$EntryIterator.next(HashMap.java:832)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
>       at 
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to