[ 
https://issues.apache.org/jira/browse/HDFS-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800989#comment-17800989
 ] 

ASF GitHub Bot commented on HDFS-17307:
---------------------------------------

matthewrossi opened a new pull request, #6387:
URL: https://github.com/apache/hadoop/pull/6387

   Restarting existing services using the docker-compose.yaml, causes the 
datanode to crash after a few seconds.
   
   How to reproduce:
   
   ```shell
   $ docker-compose up -d # everything starts ok
   $ docker-compose stop  # stop services without removing containers
   $ docker-compose up -d # everything starts, but datanode crashes after a few 
seconds
   ```
   
   The log produced by the datanode suggests the issue is due to a mismatch in 
the clusterIDs of the namenode and the datanode:
   
   ```
   datanode_1         | 2023-12-28 11:17:15 WARN  Storage:420 - Failed to add 
storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data
   datanode_1         | java.io.IOException: Incompatible clusterIDs in 
/tmp/hadoop-hadoop/dfs/data: namenode clusterID = 
CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = 
CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 
   ```
   
   After some troubleshooting I found out the namenode is not reusing the 
clusterID of the previous run because it cannot find it in the directory set by 
ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of the 
default user of the namenode, which is now "hadoop",  so the namenode is 
actually writing these information to /tmp/hadoop-hadoop/dfs/name.
   
   See 
[https://issues.apache.org/jira/browse/HDFS-17307](https://issues.apache.org/jira/browse/HDFS-17307)




> docker-compose.yaml sets namenode directory wrong causing datanode failures 
> on restart
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-17307
>                 URL: https://issues.apache.org/jira/browse/HDFS-17307
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>            Reporter: Matthew Rossi
>            Priority: Major
>
> Restarting existing services using the docker-compose.yaml, causes the 
> datanode to crash after a few seconds.
> How to reproduce:
> {code:java}
> $ docker-compose up -d # everything starts ok
> $ docker-compose stop  # stop services without removing containers
> $ docker-compose up -d # everything starts, but datanode crashes after a few 
> seconds{code}
> The log produced by the datanode suggests the issue is due to a mismatch in 
> the clusterIDs of the namenode and the datanode:
> {code:java}
> datanode_1         | 2023-12-28 11:17:15 WARN  Storage:420 - Failed to add 
> storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data
> datanode_1         | java.io.IOException: Incompatible clusterIDs in 
> /tmp/hadoop-hadoop/dfs/data: namenode clusterID = 
> CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = 
> CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 {code}
> After some troubleshooting I found out the namenode is not reusing the 
> clusterID of the previous run because it cannot find it in the directory set 
> by ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of 
> the default user of the namenode, which is now "hadoop",  so the namenode is 
> actually writing these information to /tmp/hadoop-hadoop/dfs/name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to