QiangShaowei created IOTDB-6167:
-----------------------------------
Summary: DataNode can't register to cluster when fetch system
configuration throws NPE
Key: IOTDB-6167
URL: https://issues.apache.org/jira/browse/IOTDB-6167
Project: Apache IoTDB
Issue Type: Bug
Components: Core/Cluster
Reporter: QiangShaowei
Fix For: master branch
In some special circumstances,DataNode register failed.
the reason is : when DN fistst register , it will fetch system configuration
from ConfigNode, if ConfigNode has some error or leader is not ready. the
fetched configuration will be null, so PNE will abort DN register process, and
the
'SYSTEM_PROPERTIES.deleteOnExit();' skiped.
so when restart the DN again , it restart failed beacause nodeId is -1
在一些极端特殊的情况下,DN会注册失败
原因是,DN首次注册时,会从CN端拉取系统配置,如果碰巧CN有异常或者leader没有准备好,获取的系统配置是Null,DN侧没有判断就直接使用,会抛空指针异常,就中断了注册流程。跳过了'SYSTEM_PROPERTIES.deleteOnExit();'逻辑
当DN再次启动时,由于system.properties存在,不被认为是首次重启,但是nodeId是-1,所以启动失败。
DN log info:
2023-09-20 21:45:29,041 | INFO | [main] | Successfully update ConfigNode:
[TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2, port:22259),
TEndPoint(ip:120.12.0.167, port:22259)]. |
org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96)
2023-09-20 21:45:29,042 | INFO | [main] | Pulling system configurations from
the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode
(DataNode.java:238)
2023-09-20 21:45:29,550 | ERROR | [main] | Failed to execute system command |
org.apache.iotdb.commons.ServerCommandLine (ServerCommandLine.java:69)
{color:#FF0000}java.lang.NullPointerException: null{color}
{color:#FF0000} at
org.apache.iotdb.db.conf.IoTDBDescriptor.loadGlobalConfig(IoTDBDescriptor.java:1930){color}
at
org.apache.iotdb.db.service.DataNode.pullAndCheckSystemConfigurations(DataNode.java:275)
at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:164)
at
org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
at
org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
2023-09-20 21:46:02,198 | INFO | [main] | Start to read config file
file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties
| org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:164)
2023-09-20 21:46:02,221 | INFO | [main] | Start to read config file
file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-datanode.properties
| org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:181)
2023-09-20 21:46:02,247 | INFO | [main] | initial allocateMemoryForRead =
644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor
(IoTDBDescriptor.java:1583)
2023-09-20 21:46:02,247 | INFO | [main] | initial allocateMemoryForWrite =
644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor
(IoTDBDescriptor.java:1584)
2023-09-20 21:46:02,248 | INFO | [main] | initial allocateMemoryForSchema =
214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor
(IoTDBDescriptor.java:1585)
2023-09-20 21:46:02,248 | INFO | [main] | initial allocateMemoryForConsensus =
214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor
(IoTDBDescriptor.java:1586)
2023-09-20 21:46:02,248 | INFO | [main] | allocateMemoryForSchemaRegion =
107374182 | org.apache.iotdb.db.conf.IoTDBDescriptor
(IoTDBDescriptor.java:1710)
2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForSchemaCache =
64424509 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1713)
2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForPartitionCache =
21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1717)
2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForLastCache =
21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1720)
2023-09-20 21:46:02,257 | INFO | [main] | try loading iotdb-common.properties
from
/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties
| org.apache.iotdb.tsfile.common.conf.TSFileDescriptor
(TSFileDescriptor.java:135)
2023-09-20 21:46:02,388 | INFO | [main] | IoTDB enable memory control: true |
org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:383)
2023-09-20 21:46:02,492 | INFO | [main] | IoTDB-DataNode environment
variables:
IOTDB_HOME=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/install/FusionInsight-IoTDB-1.1.0/iotdb;
IOTDB_CONF=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc;
IOTDB_DATA_HOME=null; | org.apache.iotdb.db.service.DataNode
(DataNode.java:150)
2023-09-20 21:46:02,777 | INFO | [main] | new single scheduled thread pool:
Stateful-Trigger-Information-Updater |
org.apache.iotdb.commons.concurrent.IoTDBThreadPoolFactory
(IoTDBThreadPoolFactory.java:192)
2023-09-20 21:46:02,781 | INFO | [main] | Running mode -s |
org.apache.iotdb.db.service.DataNodeServerCommandLine
(DataNodeServerCommandLine.java:96)
2023-09-20 21:46:02,790 | INFO | [main] | Starting IoTDB
1.1.0-h0.cbu.mrs.330.r3 (Build: 89ddf14-dev) |
org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:174)
2023-09-20 21:46:02,815 | WARN | [main] | Failed to copy file from
/srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp to
/srv/BigData/data1/iotdb/iotdbserver/data/system.properties |
org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:421)
2023-09-20 21:46:02,822 | INFO | [main] | Start JMX remotely: JMX is enabled
to receive remote connection on port 22258 |
org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:80)
2023-09-20 21:46:02,823 | INFO | [main] | JDK version is 8. |
org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:49)
2023-09-20 21:46:02,832 | INFO | [main] | Successfully update ConfigNode:
[TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2, port:22259),
TEndPoint(ip:120.12.0.167, port:22259)]. |
org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96)
2023-09-20 21:46:02,835 | INFO | [main] | Pulling system configurations from
the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode
(DataNode.java:238)
2023-09-20 21:46:03,514 | WARN | [main] | Failed to connect to ConfigNode
TEndPoint(ip:120.12.0.167, port:22259) from DataNode TEndPoint(ip:120.12.0.167,
port:22260), because the current node is not leader, try next node |
org.apache.iotdb.db.client.ConfigNodeClient (ConfigNodeClient.java:308)
2023-09-20 21:46:04,760 | INFO | [main] | Create system.properties.tmp
/srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp. |
org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:537)
2023-09-20 21:46:04,764 | INFO | [main] | Successfully pull system
configurations from ConfigNode-leader. | org.apache.iotdb.db.service.DataNode
(DataNode.java:306)
2023-09-20 21:46:04,764 | INFO | [main] | Sending restart request to
ConfigNode-leader... | org.apache.iotdb.db.service.DataNode (DataNode.java:405)
2023-09-20 21:46:04,807 | ERROR | [main] | Fail to start server |
{color:#FF0000}org.apache.iotdb.db.service.DataNode (DataNode.java:189) {color}
{color:#FF0000}org.apache.iotdb.commons.exception.StartupException: Reject
DataNode restart. Because the nodeId of the current DataNode is -1. Possible
solutions are as follows:{color}
{color:#FF0000} 1. Delete "data" dir and retry.{color}
{color:#FF0000} at
org.apache.iotdb.db.service.DataNode.sendRestartRequestToConfigNode(DataNode.java:452){color}
at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:171)
at
org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
at
org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
2023-09-20 21:46:04,808 | INFO | [main] | Deactivating IoTDB DataNode... |
org.apache.iotdb.db.service.DataNode (DataNode.java:864)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)