[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize

Todd Lipcon (Commented) (JIRA) Wed, 04 Apr 2012 15:50:44 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246810#comment-13246810
 ]


Todd Lipcon commented on HDFS-2882:
-----------------------------------

Trying to think through what the correct/expected behavior is here:

Scenario 1: user configures DN to point to a single cluster which doesn't match 
its storage
Possible results:
1) DN keeps running, with no block pools
2) DN shuts down once its one-and-only service fails.

I think #2 is fairly clearly correct here.

----

Scenario 2: user configures DN to point to one NN. The user adds an additional 
nameservice to the config and issues a -refreshNamenodes call. The newly added 
nameservice is from the wrong cluster.
Possible results:
1) DN tries to connect and fails. It logs a message indicating this, but keeps 
running with its existing service.
2) DN tries to connect and fails. It aborts the whole datanode.

I think #1 is correct here.

----
Scenario 3: user configures DN to point to two different NNs which are on 
different clusters, and starts up.
Possible results:
1) DN connects to cluster 1, and joins the cluster. Service to Cluster 2 fails, 
but the DN stays running. The admin may issue refreshNodes to try to connect 
again.
2) DN connects to cluster 1 and joins the cluster. When service to cluster 2 is 
rejected, the DN shuts down.

It's unclear what the correct results are here. My leaning is towards #2, but 
not certain.
                
> DN continues to start up, even if block pool fails to initialize
> ----------------------------------------------------------------
>
>                 Key: HDFS-2882
>                 URL: https://issues.apache.org/jira/browse/HDFS-2882
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> I started a DN on a machine that was completely out of space on one of its 
> drives. I saw the following:
> 2012-02-02 09:56:50,499 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id 
> DS-507718931-172.29.5.194-11072-12978
> 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
> java.io.IOException: Mkdirs failed to create 
> /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
>         at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.<init>(FSDataset.java:335)
> but the DN continued to run, spewing NPEs when it tried to do block reports, 
> etc. This was on the HDFS-1623 branch but may affect trunk as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize

Reply via email to