[ 
https://issues.apache.org/jira/browse/HDFS-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455620#comment-13455620
 ] 

Colin Patrick McCabe commented on HDFS-3936:
--------------------------------------------

I would lean towards solution #3.  It might need a little bit of finesse, but 
it should be simple in theory to have the lock semantic of "wait for us to get 
the lock or be told to exit."

I'm afraid of hitting other issues if we go with #4, since 
BlockManager#replicationThread touches a lot more stuff than just BlocksMap.  
The replication manager does a lot of stuff, and it really seems like we're 
asking for trouble if we don't shut it down at the end.
                
> MiniDFSCluster shutdown may fail due to BlocksMap#getBlockCollection NPE
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3936
>                 URL: https://issues.apache.org/jira/browse/HDFS-3936
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>
> Looks like HDFS-3664 didn't fix the whole issue because the added join times 
> out because the thread closing the BM (FSN#stopCommonServices) holds the FSN 
> lock while closing the BM and the BM is block uninterruptedly trying to 
> aquire the FSN lock.
> {noformat}
> 2012-09-13 18:54:12,526 FATAL hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1355)) - Test resulted in an unexpected exit
> org.apache.hadoop.util.ExitUtil$ExitException: Fatal exception with message 
> null
> stack trace
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getBlockCollection(BlocksMap.java:101)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1132)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1107)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3061)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3023)
>       at java.lang.Thread.run(Thread.java:662)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to