[
https://issues.apache.org/jira/browse/IGNITE-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477058#comment-17477058
]
Vladislav Pyatkov edited comment on IGNITE-15257 at 1/17/22, 9:32 AM:
----------------------------------------------------------------------
The reason of the issue in using of Disruptor tasks.
The tasks must not use after the Disruptor handler has processed these tasks,
but before the patch, we executed a task in the another pool.
The solution is extracts a closure from the disruptor task and process it in a
different thread.
was (Author: v.pyatkov):
The reason of the issue in using of Disruptor tasks.
The tasks must not use after the Disruptor handler has processed these tasks,
but before the patch, we executed a task in the another pool.
> ITNodeTest.testNodeTaskOverload is flaky
> ----------------------------------------
>
> Key: IGNITE-15257
> URL: https://issues.apache.org/jira/browse/IGNITE-15257
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 3.0.0-alpha2
> Reporter: Ivan Bessonov
> Assignee: Vladislav Pyatkov
> Priority: Blocker
> Labels: ignite-3
>
> Test failed locally:
> {code:java}
> Aug 05, 2021 10:02:29 AM org.apache.ignite.lang.IgniteLogger logInternalAug
> 05, 2021 10:02:29 AM org.apache.ignite.lang.IgniteLogger logInternalINFO:
> >>>>>>>>>>>>>>> Start test method: testNodeTaskOverload()Aug 05, 2021
> 10:02:29 AM org.apache.ignite.lang.IgniteLogger logInternalINFO: Connection
> created [address=/0:0:0:0:0:0:0:0:5003]Aug 05, 2021 10:02:29 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Node joined: ClusterNode
> [id=5f83a413-f3b0-406f-93fd-74ecc2ee1370, name=127.0.1.1:5003,
> address=127.0.1.1:5003]Aug 05, 2021 10:02:29 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Topology snapshot
> [nodes=1] ^-- ClusterNode [id=5f83a413-f3b0-406f-93fd-74ecc2ee1370,
> name=127.0.1.1:5003, address=127.0.1.1:5003]Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Starts FSMCaller
> successfully.Aug 05, 2021 10:02:30 AM org.apache.ignite.lang.IgniteLogger
> logInternalWARNING: No data for snapshot reader
> target/work/ITNodeTest/testNodeTaskOverload_1628146949951/snapshot.Aug 05,
> 2021 10:02:30 AM org.apache.ignite.lang.IgniteLogger logInternalINFO: Node
> <unittest/127.0.1.1:5003> init, term=0, lastLogId=LogId [index=0, term=0],
> conf=127.0.1.1:5003, oldConf=.Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Node
> <unittest/127.0.1.1:5003> start vote and grant vote self, term=0.Aug 05, 2021
> 10:02:30 AM org.apache.ignite.lang.IgniteLogger logInternalINFO: Save raft
> meta, path=target/work/ITNodeTest/testNodeTaskOverload_1628146949951/meta,
> term=1, votedFor=127.0.1.1:5003, cost time=7 msAug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Node
> <unittest/127.0.1.1:5003> become leader of group, term=1,
> conf=127.0.1.1:5003, oldConf=.Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Start the
> RaftGroupService successfully <unittest/127.0.1.1:5003>Aug 05, 2021 10:02:30
> AM org.apache.ignite.lang.IgniteLogger logInternalWARNING: Node
> <unittest/127.0.1.1:5003> applyQueue is overload.Status[EBUSY<1009>: Node is
> busy, has too many tasks.]Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: onLeaderStart:
> term=1.Aug 05, 2021 10:02:30 AM org.apache.ignite.lang.IgniteLogger
> logInternalSEVERE: Node <unittest/127.0.1.1:5003> append [2, 2] failed,
> status=Status[EIO<1014>: Corrupted LogStorage].Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalSEVERE: Encountered an
> error=Status[EBUSY<1009>: LogManager is busy, disk queue overload.] on
> StateMachine org.apache.ignite.raft.jraft.core.MockStateMachine, it's highly
> recommended to implement this method as raft stops working since some error
> occurs, you should figure out the cause and repair or remove this node.Error
> [type=ERROR_TYPE_LOG, status=Status[EBUSY<1009>: LogManager is busy, disk
> queue overload.]] at
> org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:577)Status[EBUSY<1009>:
> Node is busy, has too many tasks.] at
> org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.appendEntries(LogManagerImpl.java:332)
> at
> org.apache.ignite.raft.jraft.core.NodeImpl.executeApplyingTasks(NodeImpl.java:1371)
> at
> org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:292)
> at
> org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:275)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:214)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:179)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Aug 05, 2021 10:02:30 AM org.apache.ignite.lang.IgniteLogger
> logInternalWARNING: Node <unittest/127.0.1.1:5003> got error: Error
> [type=ERROR_TYPE_LOG, status=Status[EBUSY<1009>: LogManager is busy, disk
> queue overload.]].Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger warnWARNING: FSMCaller already in error
> status, ignore new errorError [type=ERROR_TYPE_LOG,
> status=Status[EBUSY<1009>: LogManager is busy, disk queue overload.]] at
> org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:577)
> at
> org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.appendEntries(LogManagerImpl.java:332)
> at
> org.apache.ignite.raft.jraft.core.NodeImpl.executeApplyingTasks(NodeImpl.java:1371)
> at
> org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:292)
> at
> org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosureHandler.onEvent(NodeImpl.java:275)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:214)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:179)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Aug 05, 2021 10:02:30 AM org.apache.ignite.lang.IgniteLogger
> logInternalWARNING: Node <unittest/127.0.1.1:5003> applyQueue is overload.Aug
> 05, 2021 10:02:30 AM org.apache.ignite.lang.IgniteLogger logInternalINFO:
> Fail to find the next candidate.Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalStatus[EPERM<1008>: Leader
> stepped down]SEVERE: Node <unittest/127.0.1.1:5003> append [0, 0] failed,
> status=Status[EIO<1014>: Corrupted LogStorage].Status[EPERM<1008>: Leader
> stepped down]Status[EPERM<1008>: Leader stepped down]Status[EPERM<1008>:
> Leader stepped down]Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: onLeaderStop:
> status=Status[EBADNODE<10009>: Raft node(leader or candidate) is in
> error.].Aug 05, 2021 10:02:30 AM org.apache.ignite.lang.IgniteLogger
> logInternalSEVERE: Node <unittest/127.0.1.1:5003> append [0, 0] failed,
> status=Status[EIO<1014>: Corrupted LogStorage].Aug 05, 2021 10:02:30 AM
> org.apache.ignite.lang.IgniteLogger logInternalWARNING: Node
> <unittest/127.0.1.1:5003> applyQueue is overload.Status[EPERM<1008>: Is not
> leader.]Status[EBUSY<1009>: Node is busy, has too many
> tasks.]Status[EPERM<1008>: Is not leader.]Status[EPERM<1008>: Is not
> leader.]Aug 05, 2021 10:03:00 AM org.apache.ignite.lang.IgniteLogger
> logInternalINFO: Node <unittest/127.0.1.1:5003> shutdown, currTerm=1
> state=STATE_ERROR.Aug 05, 2021 10:03:00 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Save raft meta,
> path=target/work/ITNodeTest/testNodeTaskOverload_1628146949951/meta, term=1,
> votedFor=127.0.1.1:5003, cost time=14 msAug 05, 2021 10:03:00 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Shutting down
> FSMCaller...Aug 05, 2021 10:03:00 AM org.apache.ignite.lang.IgniteLogger
> logInternalINFO: onShutdown.Aug 05, 2021 10:03:00 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: ThreadPool is
> terminated: JRaft-Node-ScheduleThreadPool-null-,
> org.apache.ignite.raft.jraft.util.MetricScheduledThreadPoolExecutor@24f4621[Shutting
> down, pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
> 30].Aug 05, 2021 10:03:00 AM org.apache.ignite.lang.IgniteLogger
> logInternalINFO: Destroy timer: RepeatedTimer{timeout=null, stopped=true,
> running=false, destroyed=true, invoking=false, timeoutMs=1000,
> name='JRaft-ElectionTimer-null-'}.Aug 05, 2021 10:03:00 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Destroy timer:
> RepeatedTimer{timeout=null, stopped=true, running=false, destroyed=true,
> invoking=false, timeoutMs=1000, name='JRaft-VoteTimer-null-'}.Aug 05, 2021
> 10:03:00 AM org.apache.ignite.lang.IgniteLogger logInternalINFO: Destroy
> timer: RepeatedTimer{timeout=null, stopped=true, running=false,
> destroyed=true, invoking=false, timeoutMs=500,
> name='JRaft-StepDownTimer-null-'}.Aug 05, 2021 10:03:00 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Destroy timer:
> RepeatedTimer{timeout=null, stopped=true, running=false, destroyed=true,
> invoking=false, timeoutMs=3600000, name='JRaft-SnapshotTimer-null-'}.Aug 05,
> 2021 10:03:00 AM org.apache.ignite.lang.IgniteLogger logInternalINFO: DB
> destroyed, the db path is:
> target/work/ITNodeTest/testNodeTaskOverload_1628146949951/log.Aug 05, 2021
> 10:03:00 AM org.apache.ignite.lang.IgniteLogger logInternalINFO: Stop the
> RaftGroupService successfully.Aug 05, 2021 10:03:00 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: Stopping
> 127.0.1.1:5003Aug 05, 2021 10:03:00 AM org.apache.ignite.lang.IgniteLogger
> logInternalINFO: Stopped 127.0.1.1:5003Aug 05, 2021 10:03:00 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: >>>>>>>>>>>>>>> End test
> method: testNodeTaskOverload(), cost:30185 ms.Aug 05, 2021 10:03:00 AM
> org.apache.ignite.lang.IgniteLogger logInternalINFO: >>>>>>>>>>>>>>> Start
> test method: testRecoverFollower()
> java.util.concurrent.TimeoutException: joined timeout
> at
> org.apache.ignite.raft.jraft.closure.JoinableClosure.join(JoinableClosure.java:50)
> at org.apache.ignite.raft.jraft.entity.Task.join(Task.java:121) at
> org.apache.ignite.raft.jraft.entity.Task.joinAll(Task.java:154) at
> org.apache.ignite.raft.jraft.core.ITNodeTest.testNodeTaskOverload(ITNodeTest.java:256)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566) at
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
> at
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> at
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> at
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> at
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> at
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> at
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> at
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> at
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> at
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> at
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> at
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> at
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> at
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
> at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
> at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
> at
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
> at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
> at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
> at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
> at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) at
> org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
> at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
> at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
> at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
> at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) at
> org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
> at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
> at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
> at
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
> at
> org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
> at
> org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
> at
> org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
> at
> org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
> at
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
> at
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
> at
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
> at
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
> at
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
> at
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:96)
> at
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:75)
> at
> com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:71)
> at
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
> at
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:221)
> at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)