[
https://issues.apache.org/jira/browse/IOTDB-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638667#comment-17638667
]
Jinrui Zhang commented on IOTDB-5030:
-------------------------------------
[~HeimingZ] tried to reproduce this issue in fit16-20 with 3C5D cluster and
this issue didn't occur.
We investigate the logs when the issue occurred and found that it should be a
timeout issue. At that time, fit68 was trying to dispatch a schema-read FI to
fit66 but the response was not returned in the timeout. But actually it
executed successfully in fit66 because we didn't see any error log in fit66.
And on the other hand, we found `Read timeout` log in fit68 at that time.
This indicates that the schema-read operation is not as fast as we expected in
this load. So it didn't returned the response in a tolerated interval.
According to the benchmark settings, there are 3kw series in the schema, which
is huge. There are two ways to resolve this issue currently:
# optimize the schema read execution to avoid timeout
# let users to turn up the `connection_timeout_ms` in configuration regarding
the huge load.
But...the optimization can definitely not be completed in a very short time,
according to the release stage of 1.0, I will decrease the priority of this
issue. And the optimization need [~Marcoss] to take a look
> java.lang.IllegalArgumentException: all replicas for
> region[TConsensusGroupId(type:SchemaRegion, id:6)] are not available in these
> DataNodes
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: IOTDB-5030
> URL: https://issues.apache.org/jira/browse/IOTDB-5030
> Project: Apache IoTDB
> Issue Type: Bug
> Components: mpp-cluster
> Affects Versions: 0.14.0-SNAPSHOT
> Reporter: 刘珍
> Assignee: Jinrui Zhang
> Priority: Minor
> Attachments: iotdb_4851.conf
>
>
> master_1123_32e2f98
> 1. 启动1副本3C5D集群
> 2. BM 写入数据,50分钟,ip68 报错
> {color:#DE350B}2022-11-23 15:32:46,820
> [pool-24-IoTDB-DataNodeInternalRPC-Processor-122] ERROR
> o.a.t.ProcessFunction:47 - Internal error processing sendPlanNode
> java.lang.IllegalArgumentException: all replicas for
> region[TConsensusGroupId(type:SchemaRegion, id:1)] are not available in these
> DataNodes[[TDataNodeLocation(dataNodeId:4,
> clientRpcEndPoint:TEndPoint(ip:192.168.10.66, port:6667),
> internalEndPoint:TEndPoint(ip:192.168.10.66, port:9003),
> mPPDataExchangeEndPoint:TEndPoint(ip:192.168.10.66, port:8777),
> dataRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66, port:40010),
> schemaRegionConsensusEndPoint:TEndPoint(ip:192.168.10.66,
> port:50010))]]{color}
> at
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.selectTargetDataNode(SimpleFragmentParallelPlanner.java:146)
> at
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.produceFragmentInstance(SimpleFragmentParallelPlanner.java:115)
> at
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.prepare(SimpleFragmentParallelPlanner.java:87)
> at
> org.apache.iotdb.db.mpp.plan.planner.distribution.SimpleFragmentParallelPlanner.parallelPlan(SimpleFragmentParallelPlanner.java:78)
> at
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragmentInstances(DistributionPlanner.java:94)
> at
> org.apache.iotdb.db.mpp.plan.planner.distribution.DistributionPlanner.planFragments(DistributionPlanner.java:78)
> at
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.doDistributedPlan(QueryExecution.java:304)
> at
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.start(QueryExecution.java:201)
> at
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.retry(QueryExecution.java:235)
> at
> org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getStatus(QueryExecution.java:500)
> at
> org.apache.iotdb.db.mpp.plan.Coordinator.execute(Coordinator.java:152)
> at
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:178)
> at
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156)
> at
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98)
> at
> org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265)
> at
> org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56)
> at
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:193)
> at
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:165)
> at
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:119)
> at
> org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086)
> at
> org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:85)
> at
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:283)
> at
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607)
> at
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 测试环境
> 1. 192.168.10.62/66/64/68 72CPU256GB
> 192.168.10.76 48CPU384GB
> 3C : 62,66,68
> ConfigNode
> MAX_HEAP_SIZE="8G"
> DataNode
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> Common
> max_connection_for_internal_service=300
> query_timeout_threshold=3600000
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> schema_replication_factor=1
> data_replication_factor=1
> 2. benchmark 写入数据,配置见附件。
> 约50分钟,报错见上面的LOG 。
--
This message was sent by Atlassian Jira
(v8.20.10#820010)