[ 
https://issues.apache.org/jira/browse/IOTDB-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403692#comment-17403692
 ] 

Jianyun Cheng commented on IOTDB-1583:
--------------------------------------

Append much more logs where the error start.
{code:java}
2021-08-18 14:03:58,027 [Data(10.228.72.71:9003, raftId=0)-SerialToParallel0] 
INFO  o.a.i.c.s.m.RaftMember:737 - Data(10.228.72.71:9003, raftId=0): Start to 
make Node(internalIp:10.228.72.135, metaPort:9003, nodeIdentifier:1621312527, 
dataPort:40010, clientPort:6667, clientIp:0.0.0.0) catch up 
2021-08-18 14:03:58,727 [DataClientThread-64] INFO  
o.a.i.c.l.m.s.SyncLogDequeSerializer:284 - Raft log buffer overflow! 
2021-08-18 14:03:58,737 [Data(10.228.72.71:9003, raftId=0)-CatchUpThread24] 
INFO  o.a.i.c.l.c.CatchUpTask:97 - Data(10.228.72.71:9003, raftId=0): use 1 
logs of [50000606, 50000607] to fix log inconsistency with node 
[Node(internalIp:10.228.72.135, metaPort:9003, nodeIdentifier:1621312527, 
dataPort:40010, clientPort:6667, clientIp:0.0.0.0)], local first index: 
49998998 
2021-08-18 14:03:58,737 [DataClientThread-64] ERROR o.a.i.c.s.m.RaftMember:1571 
- RuntimeException during executing 
org.apache.iotdb.db.qp.physical.sys.DeleteTimeSeriesPlan@65ef777e,term:1,index:50000606
 
java.nio.BufferOverflowException: null
        at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
        at 
org.apache.iotdb.cluster.log.manage.serializable.SyncLogDequeSerializer.putLogs(SyncLogDequeSerializer.java:290)
        at 
org.apache.iotdb.cluster.log.manage.serializable.SyncLogDequeSerializer.append(SyncLogDequeSerializer.java:243)
        at 
org.apache.iotdb.cluster.log.manage.RaftLogManager.commitTo(RaftLogManager.java:627)
        at 
org.apache.iotdb.cluster.server.member.RaftMember.commitLog(RaftMember.java:1533)
        at 
org.apache.iotdb.cluster.server.member.RaftMember.appendLogInGroup(RaftMember.java:1699)
        at 
org.apache.iotdb.cluster.server.member.RaftMember.processPlanLocally(RaftMember.java:1040)
        at 
org.apache.iotdb.cluster.server.member.DataGroupMember.executeNonQueryPlanWithKnownLeader(DataGroupMember.java:753)
        at 
org.apache.iotdb.cluster.server.member.DataGroupMember.executeNonQueryPlan(DataGroupMember.java:715)
        at 
org.apache.iotdb.cluster.server.member.RaftMember.executeNonQueryPlan(RaftMember.java:765)
        at 
org.apache.iotdb.cluster.server.service.BaseSyncService.executeNonQueryPlan(BaseSyncService.java:176)
        at 
org.apache.iotdb.cluster.server.DataClusterServer.executeNonQueryPlan(DataClusterServer.java:1036)
        at 
org.apache.iotdb.cluster.rpc.thrift.RaftService$Processor$executeNonQueryPlan.getResult(RaftService.java:918)
        at 
org.apache.iotdb.cluster.rpc.thrift.RaftService$Processor$executeNonQueryPlan.getResult(RaftService.java:898)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}

The root cause analysed here: 
https://github.com/apache/iotdb/discussions/3784#discussioncomment-1226380

> Raft log failed to be committed in cluster version
> --------------------------------------------------
>
>                 Key: IOTDB-1583
>                 URL: https://issues.apache.org/jira/browse/IOTDB-1583
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: Cluster
>    Affects Versions: master branch
>            Reporter: lisijia
>            Priority: Major
>
> In master 199519dd8d1497f4c640affc8989ad0777b15188, three nodes and three 
> replications. And i have 20 strorage group,100000 devices,and each device has 
> 50 sensors.After two hours of uninterrupted writing, I tried to write again, 
> but the client write was rejected.I found that the server log is sending an 
> error message. It seems that raftlog failed during the commit.
> {code:java}
> 2021-08-18 17:50:38,479 [DataClientThread-1100] ERROR 
> o.a.i.c.l.m.RaftLogManager:648 - Node(internalIp: x.x.x.x, metaPort:9003, 
> nodeIdentifier:1190416664, dataPort:40010, clientPort:6667, 
> clientIp:0.0.0.0): Unexpected error:
> org.apache.iotdb.cluster.exception.TruncateCommittedEntryException: The 
> committed entries cannot be truncated: parameter: 50000606, commitIndex : 
> 50000606
>         at 
> org.apache.iotdb.cluster.log.manage.CommittedEntryManager.append(CommittedEntryManager.java:246)
>         at 
> org.apache.iotdb.cluster.log.manage.RaftLogManager.commitTo(RaftLogManager.java:625)
>         at 
> org.apache.iotdb.cluster.server.member.RaftMember.commitLog(RaftMember.java:1533)
>         at 
> org.apache.iotdb.cluster.server.member.RaftMember.appendLogInGroup(RaftMember.java:1699)
>         at 
> org.apache.iotdb.cluster.server.member.RaftMember.processPlanLocally(RaftMember.java:1040)
>         at 
> org.apache.iotdb.cluster.server.member.DataGroupMember.executeNonQueryPlanWithKnownLeader(DataGroupMember.java:753)
>         at 
> org.apache.iotdb.cluster.server.member.DataGroupMember.executeNonQueryPlan(DataGroupMember.java:715)
>         at 
> org.apache.iotdb.cluster.server.member.RaftMember.executeNonQueryPlan(RaftMember.java:765)
>         at 
> org.apache.iotdb.cluster.server.service.BaseSyncService.executeNonQueryPlan(BaseSyncService.java:176)
>         at 
> org.apache.iotdb.cluster.server.DataClusterServer.executeNonQueryPlan(DataClusterServer.java:1036)
>         at 
> org.apache.iotdb.cluster.rpc.thrift.RaftService$Processor$executeNonQueryPlan.getResult(RaftService.java:918)
>         at 
> org.apache.iotdb.cluster.rpc.thrift.RaftService$Processor$executeNonQueryPlan.getResult(RaftService.java:898)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to