[
https://issues.apache.org/jira/browse/IOTDB-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
QiangShaowei reassigned IOTDB-6167:
-----------------------------------
Assignee: QiangShaowei
> DataNode can't register to cluster when fetch system configuration throws NPE
> -----------------------------------------------------------------------------
>
> Key: IOTDB-6167
> URL: https://issues.apache.org/jira/browse/IOTDB-6167
> Project: Apache IoTDB
> Issue Type: Bug
> Components: Core/Cluster
> Reporter: QiangShaowei
> Assignee: QiangShaowei
> Priority: Major
> Fix For: master branch
>
>
> In some special circumstances,DataNode register failed.
> the reason is : when DN fistst register , it will fetch system configuration
> from ConfigNode, if ConfigNode has some error or leader is not ready. the
> fetched configuration will be null, so PNE will abort DN register process,
> and the
> 'SYSTEM_PROPERTIES.deleteOnExit();' skiped.
> so when restart the DN again , it restart failed beacause nodeId is -1
>
> 在一些极端特殊的情况下,DN会注册失败
> 原因是,DN首次注册时,会从CN端拉取系统配置,如果碰巧CN有异常或者leader没有准备好,获取的系统配置是Null,DN侧没有判断就直接使用,会抛空指针异常,就中断了注册流程。跳过了'SYSTEM_PROPERTIES.deleteOnExit();'逻辑
> 当DN再次启动时,由于system.properties存在,不被认为是首次重启,但是nodeId是-1,所以启动失败。
>
> DN log info:
>
> 2023-09-20 21:45:29,041 | INFO | [main] | Successfully update ConfigNode:
> [TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2,
> port:22259), TEndPoint(ip:120.12.0.167, port:22259)]. |
> org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96)
> 2023-09-20 21:45:29,042 | INFO | [main] | Pulling system configurations from
> the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode
> (DataNode.java:238)
> 2023-09-20 21:45:29,550 | ERROR | [main] | Failed to execute system command |
> org.apache.iotdb.commons.ServerCommandLine (ServerCommandLine.java:69)
> {color:#FF0000}java.lang.NullPointerException: null{color}
> {color:#FF0000} at
> org.apache.iotdb.db.conf.IoTDBDescriptor.loadGlobalConfig(IoTDBDescriptor.java:1930){color}
> at
> org.apache.iotdb.db.service.DataNode.pullAndCheckSystemConfigurations(DataNode.java:275)
> at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:164)
> at
> org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
> at
> org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
> at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
> at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
> 2023-09-20 21:46:02,198 | INFO | [main] | Start to read config file
> file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties
> | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:164)
> 2023-09-20 21:46:02,221 | INFO | [main] | Start to read config file
> file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-datanode.properties
> | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:181)
> 2023-09-20 21:46:02,247 | INFO | [main] | initial allocateMemoryForRead =
> 644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor
> (IoTDBDescriptor.java:1583)
> 2023-09-20 21:46:02,247 | INFO | [main] | initial allocateMemoryForWrite =
> 644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor
> (IoTDBDescriptor.java:1584)
> 2023-09-20 21:46:02,248 | INFO | [main] | initial allocateMemoryForSchema =
> 214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor
> (IoTDBDescriptor.java:1585)
> 2023-09-20 21:46:02,248 | INFO | [main] | initial allocateMemoryForConsensus
> = 214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor
> (IoTDBDescriptor.java:1586)
> 2023-09-20 21:46:02,248 | INFO | [main] | allocateMemoryForSchemaRegion =
> 107374182 | org.apache.iotdb.db.conf.IoTDBDescriptor
> (IoTDBDescriptor.java:1710)
> 2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForSchemaCache =
> 64424509 | org.apache.iotdb.db.conf.IoTDBDescriptor
> (IoTDBDescriptor.java:1713)
> 2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForPartitionCache =
> 21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor
> (IoTDBDescriptor.java:1717)
> 2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForLastCache =
> 21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor
> (IoTDBDescriptor.java:1720)
> 2023-09-20 21:46:02,257 | INFO | [main] | try loading
> iotdb-common.properties from
> /opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties
> | org.apache.iotdb.tsfile.common.conf.TSFileDescriptor
> (TSFileDescriptor.java:135)
> 2023-09-20 21:46:02,388 | INFO | [main] | IoTDB enable memory control: true
> | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:383)
> 2023-09-20 21:46:02,492 | INFO | [main] | IoTDB-DataNode environment
> variables:
>
> IOTDB_HOME=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/install/FusionInsight-IoTDB-1.1.0/iotdb;
> IOTDB_CONF=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc;
> IOTDB_DATA_HOME=null; | org.apache.iotdb.db.service.DataNode
> (DataNode.java:150)
> 2023-09-20 21:46:02,777 | INFO | [main] | new single scheduled thread pool:
> Stateful-Trigger-Information-Updater |
> org.apache.iotdb.commons.concurrent.IoTDBThreadPoolFactory
> (IoTDBThreadPoolFactory.java:192)
> 2023-09-20 21:46:02,781 | INFO | [main] | Running mode -s |
> org.apache.iotdb.db.service.DataNodeServerCommandLine
> (DataNodeServerCommandLine.java:96)
> 2023-09-20 21:46:02,790 | INFO | [main] | Starting IoTDB
> 1.1.0-h0.cbu.mrs.330.r3 (Build: 89ddf14-dev) |
> org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:174)
> 2023-09-20 21:46:02,815 | WARN | [main] | Failed to copy file from
> /srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp to
> /srv/BigData/data1/iotdb/iotdbserver/data/system.properties |
> org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:421)
> 2023-09-20 21:46:02,822 | INFO | [main] | Start JMX remotely: JMX is enabled
> to receive remote connection on port 22258 |
> org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:80)
> 2023-09-20 21:46:02,823 | INFO | [main] | JDK version is 8. |
> org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:49)
> 2023-09-20 21:46:02,832 | INFO | [main] | Successfully update ConfigNode:
> [TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2,
> port:22259), TEndPoint(ip:120.12.0.167, port:22259)]. |
> org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96)
> 2023-09-20 21:46:02,835 | INFO | [main] | Pulling system configurations from
> the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode
> (DataNode.java:238)
> 2023-09-20 21:46:03,514 | WARN | [main] | Failed to connect to ConfigNode
> TEndPoint(ip:120.12.0.167, port:22259) from DataNode
> TEndPoint(ip:120.12.0.167, port:22260), because the current node is not
> leader, try next node | org.apache.iotdb.db.client.ConfigNodeClient
> (ConfigNodeClient.java:308)
> 2023-09-20 21:46:04,760 | INFO | [main] | Create system.properties.tmp
> /srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp. |
> org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:537)
> 2023-09-20 21:46:04,764 | INFO | [main] | Successfully pull system
> configurations from ConfigNode-leader. | org.apache.iotdb.db.service.DataNode
> (DataNode.java:306)
> 2023-09-20 21:46:04,764 | INFO | [main] | Sending restart request to
> ConfigNode-leader... | org.apache.iotdb.db.service.DataNode
> (DataNode.java:405)
> 2023-09-20 21:46:04,807 | ERROR | [main] | Fail to start server |
> {color:#FF0000}org.apache.iotdb.db.service.DataNode (DataNode.java:189)
> {color}
> {color:#FF0000}org.apache.iotdb.commons.exception.StartupException: Reject
> DataNode restart. Because the nodeId of the current DataNode is -1. Possible
> solutions are as follows:{color}
> {color:#FF0000} 1. Delete "data" dir and retry.{color}
> {color:#FF0000} at
> org.apache.iotdb.db.service.DataNode.sendRestartRequestToConfigNode(DataNode.java:452){color}
> at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:171)
> at
> org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
> at
> org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
> at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
> at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
> 2023-09-20 21:46:04,808 | INFO | [main] | Deactivating IoTDB DataNode... |
> org.apache.iotdb.db.service.DataNode (DataNode.java:864)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)