[ 
https://issues.apache.org/jira/browse/HDFS-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455632#comment-13455632
 ] 

Eli Collins commented on HDFS-3936:
-----------------------------------

I'll see what the patch for #3 looks like, it was my first instinct but thought 
it a little goofy to eg fail an addBlock RPC because the replication monitor is 
interrupted.  

bq. The replication manager does a lot of stuff, and it really seems like we're 
asking for trouble if we don't shut it down at the end.

With #4 the replication manager is shutdown cleanly, it unwinds because all 
replication queues and the block map are shutdown. Ie rather than NPE in this 
case it finishes it's current cycle and bails cleanly. The only thing that 
BM#close does that prevents this is null out the array in BlocksMap.
                
> MiniDFSCluster shutdown may fail due to BlocksMap#getBlockCollection NPE
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3936
>                 URL: https://issues.apache.org/jira/browse/HDFS-3936
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>
> Looks like HDFS-3664 didn't fix the whole issue because the added join times 
> out because the thread closing the BM (FSN#stopCommonServices) holds the FSN 
> lock while closing the BM and the BM is block uninterruptedly trying to 
> aquire the FSN lock.
> {noformat}
> 2012-09-13 18:54:12,526 FATAL hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1355)) - Test resulted in an unexpected exit
> org.apache.hadoop.util.ExitUtil$ExitException: Fatal exception with message 
> null
> stack trace
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getBlockCollection(BlocksMap.java:101)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1132)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1107)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3061)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3023)
>       at java.lang.Thread.run(Thread.java:662)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to