[
https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Kosarev reassigned IGNITE-9296:
--------------------------------------
Assignee: Sergey Kosarev
> Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
> ------------------------------------------------------------------------------
>
> Key: IGNITE-9296
> URL: https://issues.apache.org/jira/browse/IGNITE-9296
> Project: Ignite
> Issue Type: Bug
> Reporter: Sergey Kosarev
> Assignee: Sergey Kosarev
> Priority: Major
> Attachments: logs.zip
>
>
> Here are log messages:
> {code}
> [18:46:27]W: [org.apache.ignite:ignite-core] [2018-08-15
> 15:46:27,442][ERROR][main][root] Test has been timed out and will be
> interrupted (threads dump will be taken before interruption)
> [test=testFailWhileStart, timeout=60000]
> {code}
> And later on all the suite also hangs up:
> {code}
> [22:22:49]E: [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184
> {buildId=1662285} has been running for more than 240 minutes. Terminating...
> Main thread locked by node-stopper:
> [18:46:27] : [Step 3/4] Thread
> [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150,
> state=BLOCKED, blockCnt=4, waitCnt=142]
> [18:46:27] : [Step 3/4] Lock
> [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90,
> ownerName=node-stopper, ownerId=9267]
> [18:46:27] : [Step 3/4] at
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
> [18:46:27] : [Step 3/4] at
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:27] : [Step 3/4] at
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:27] : [Step 3/4] at o.a.i.Ignition.stop(Ignition.java:229)
> [18:46:27] : [Step 3/4] at
> o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
> [18:46:27] : [Step 3/4] at
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
> [18:46:27] : [Step 3/4] at
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
> [18:46:27] : [Step 3/4] at
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
> [18:46:27] : [Step 3/4] at
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
> node-stopper waits for the wal-segment-syncer stopping
> [18:46:28]W: [org.apache.ignite:ignite-core] Thread
> [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
> [18:46:28]W: [org.apache.ignite:ignite-core] Lock
> [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> java.lang.Object.wait(Native Method)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> java.lang.Object.wait(Object.java:502)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594)
> [18:46:28]W: [org.apache.ignite:ignite-core] - locked
> o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
> wal-segment-syncer waits until wal-write-worker flushes data:
> [18:46:28]W: [org.apache.ignite:ignite-core] Thread
> [name="wal-segment-syncer-#7782%wal.IgniteWalFlushBackgroundSelfTest1%",
> id=9253, state=RUNNABLE, blockCnt=0, waitCnt=860657904]
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> sun.misc.Unsafe.park(Native Method)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushBuffer(FileWriteAheadLogManager.java:3455)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushAll(FileWriteAheadLogManager.java:3419)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:2704)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flushOrWait(FileWriteAheadLogManager.java:2696)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2776)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2538)
> [18:46:28]W: [org.apache.ignite:ignite-core] at
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:820)
> And there are no wal-write-worker on the node as he is already interrupted:
> [18:45:34]W: [org.apache.ignite:ignite-core] [2018-08-15
> 15:45:34,132][ERROR][wal-write-worker%wal.IgniteWalFlushBackgroundSelfTest1-#7783%wal.IgniteWalFlushBackgroundSelfTest1%][IgniteTestRes
> ources] Critical system error detected. Will be handled accordingly to
> configured handler [hnd=class o.a.i.failure.StopNodeFailureHandler,
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i
> .pagemem.wal.StorageException: Failed to write buffer.]]
> Caused by: java.io.IOException: No space left on device (This exception is
> generated intentionally by test logic)
> {code}
> As we don't have wal-write-worker wal-segment-syncer will be waiting for
> good.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)