[ 
https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829617#comment-13829617
 ] 

Colin Patrick McCabe commented on HDFS-2882:
--------------------------------------------

bq. These are not exactly mis-configurations, May be possible cases where we 
need to decide datanode should go down/keep running.

No, they are exactly misconfigurations.  Remember them again:

{code}
Scenario 1: user configures DN to point to a single cluster which doesn't match 
its storage
Scenario 2: user configures DN to point to one NN. The user adds an additional 
nameservice to the config and issues a -refreshNamenodes call. The newly added 
nameservice is from the wrong cluster.
Scenario 3: user configures DN to point to two different NNs which are on 
different clusters, and starts up.
{code}

None of those are correct configurations.

bq. Another issue HDFS-5529 raised by Brahma is due to disk error.

That is a good point.  Based on the log message he is seeing, the DN definitely 
is continuing on for some length of time after a block pool has failed, which 
seems related to this bug.

I definitely think there are issues around DataNode / block pool lifecycle, but 
I don't have a good handle on what they are yet.  I need to review what the 
expected behavior is in these scenarios.

> DN continues to start up, even if block pool fails to initialize
> ----------------------------------------------------------------
>
>                 Key: HDFS-2882
>                 URL: https://issues.apache.org/jira/browse/HDFS-2882
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Vinay
>         Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, 
> HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, hdfs-2882.txt
>
>
> I started a DN on a machine that was completely out of space on one of its 
> drives. I saw the following:
> 2012-02-02 09:56:50,499 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id 
> DS-507718931-172.29.5.194-11072-12978
> 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
> java.io.IOException: Mkdirs failed to create 
> /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
>         at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.<init>(FSDataset.java:335)
> but the DN continued to run, spewing NPEs when it tried to do block reports, 
> etc. This was on the HDFS-1623 branch but may affect trunk as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to