[ https://issues.apache.org/jira/browse/HDFS-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800989#comment-17800989 ]
ASF GitHub Bot commented on HDFS-17307: --------------------------------------- matthewrossi opened a new pull request, #6387: URL: https://github.com/apache/hadoop/pull/6387 Restarting existing services using the docker-compose.yaml, causes the datanode to crash after a few seconds. How to reproduce: ```shell $ docker-compose up -d # everything starts ok $ docker-compose stop # stop services without removing containers $ docker-compose up -d # everything starts, but datanode crashes after a few seconds ``` The log produced by the datanode suggests the issue is due to a mismatch in the clusterIDs of the namenode and the datanode: ``` datanode_1 | 2023-12-28 11:17:15 WARN Storage:420 - Failed to add storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data datanode_1 | java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-hadoop/dfs/data: namenode clusterID = CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 ``` After some troubleshooting I found out the namenode is not reusing the clusterID of the previous run because it cannot find it in the directory set by ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of the default user of the namenode, which is now "hadoop", so the namenode is actually writing these information to /tmp/hadoop-hadoop/dfs/name. See [https://issues.apache.org/jira/browse/HDFS-17307](https://issues.apache.org/jira/browse/HDFS-17307) > docker-compose.yaml sets namenode directory wrong causing datanode failures > on restart > -------------------------------------------------------------------------------------- > > Key: HDFS-17307 > URL: https://issues.apache.org/jira/browse/HDFS-17307 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Reporter: Matthew Rossi > Priority: Major > > Restarting existing services using the docker-compose.yaml, causes the > datanode to crash after a few seconds. > How to reproduce: > {code:java} > $ docker-compose up -d # everything starts ok > $ docker-compose stop # stop services without removing containers > $ docker-compose up -d # everything starts, but datanode crashes after a few > seconds{code} > The log produced by the datanode suggests the issue is due to a mismatch in > the clusterIDs of the namenode and the datanode: > {code:java} > datanode_1 | 2023-12-28 11:17:15 WARN Storage:420 - Failed to add > storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data > datanode_1 | java.io.IOException: Incompatible clusterIDs in > /tmp/hadoop-hadoop/dfs/data: namenode clusterID = > CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = > CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 {code} > After some troubleshooting I found out the namenode is not reusing the > clusterID of the previous run because it cannot find it in the directory set > by ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of > the default user of the namenode, which is now "hadoop", so the namenode is > actually writing these information to /tmp/hadoop-hadoop/dfs/name. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org