[ 
https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829452#comment-13829452
 ] 

Vinay commented on HDFS-2882:
-----------------------------

Hi colin, 
Thanks for taking a look at patch and sorry for confusing you.

Yes.. its able to reproduce easily in only HA installation.
1. Make one of the data directory unwritable
2. Restart the datanode

Here blockpool initialization will fail for first name node connected and that 
BPSA will exit. But for second namenode it will not try to initialize block 
pool. As namespace info was not null. . 
And it tries to send heartbeats and throws NPEs continously. 

Todd suggested 3 scenarios to be handled in this case. And he proposed an 
initial patch.  I just continued the approach. 

> DN continues to start up, even if block pool fails to initialize
> ----------------------------------------------------------------
>
>                 Key: HDFS-2882
>                 URL: https://issues.apache.org/jira/browse/HDFS-2882
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Vinay
>         Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, 
> HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, hdfs-2882.txt
>
>
> I started a DN on a machine that was completely out of space on one of its 
> drives. I saw the following:
> 2012-02-02 09:56:50,499 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id 
> DS-507718931-172.29.5.194-11072-12978
> 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
> java.io.IOException: Mkdirs failed to create 
> /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
>         at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.<init>(FSDataset.java:335)
> but the DN continued to run, spewing NPEs when it tried to do block reports, 
> etc. This was on the HDFS-1623 branch but may affect trunk as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to