[ 
https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739356#comment-13739356
 ] 

Vinay commented on HDFS-2882:
-----------------------------

bq. Did you reproduce the problem? If so, what were the steps to reproduce?
Please check the test. I had just reproduced cases mentioned by Todd.

bq. Also, your patch seems to make the DataNode loop endlessly trying to 
initialize any block pools that don't come up. I don't think that's what we 
want to do here.
No. In case of multiple namenodes nameservice, if any one of the namenode is 
able to connect and BPOS is initialized, then only retry will be infinite for 
the other namenode. Retry to initialize BPOS will continue until both Namenodes 
failed to initialize else BPOS will exit.

One more thing {{BPServiceActor#retrieveNamespaceInfo()}} is in inifinite loop, 
yes this can cause initialize to goto infinite loop, if namenode was down/not 
responding. But this is not changed in my patch.
                
> DN continues to start up, even if block pool fails to initialize
> ----------------------------------------------------------------
>
>                 Key: HDFS-2882
>                 URL: https://issues.apache.org/jira/browse/HDFS-2882
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-2882.patch, hdfs-2882.txt
>
>
> I started a DN on a machine that was completely out of space on one of its 
> drives. I saw the following:
> 2012-02-02 09:56:50,499 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id 
> DS-507718931-172.29.5.194-11072-12978
> 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
> java.io.IOException: Mkdirs failed to create 
> /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
>         at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.<init>(FSDataset.java:335)
> but the DN continued to run, spewing NPEs when it tried to do block reports, 
> etc. This was on the HDFS-1623 branch but may affect trunk as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to