[ 
https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Kosarev reassigned IGNITE-9296:
--------------------------------------

    Assignee: Sergey Kosarev

>  Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-9296
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9296
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Sergey Kosarev
>            Assignee: Sergey Kosarev
>            Priority: Major
>         Attachments: logs.zip
>
>
> Here are log messages:
> {code}
> [18:46:27]W:             [org.apache.ignite:ignite-core] [2018-08-15 
> 15:46:27,442][ERROR][main][root] Test has been timed out and will be 
> interrupted (threads dump will be taken before interruption) 
> [test=testFailWhileStart, timeout=60000]
> {code}
> And later on all the suite also hangs up:
> {code}
> [22:22:49]E:     [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 
> {buildId=1662285} has been running for more than 240 minutes. Terminating...
> Main thread locked by node-stopper:
> [18:46:27] :     [Step 3/4] Thread 
> [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, 
> state=BLOCKED, blockCnt=4, waitCnt=142]
> [18:46:27] :     [Step 3/4]     Lock 
> [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, 
> ownerName=node-stopper, ownerId=9267]
> [18:46:27] :     [Step 3/4]         at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
> [18:46:27] :     [Step 3/4]         at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:27] :     [Step 3/4]         at 
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:27] :     [Step 3/4]         at o.a.i.Ignition.stop(Ignition.java:229)
> [18:46:27] :     [Step 3/4]         at 
> o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
> [18:46:27] :     [Step 3/4]         at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
> [18:46:27] :     [Step 3/4]         at 
> o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
> [18:46:27] :     [Step 3/4]         at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
> [18:46:27] :     [Step 3/4]         at 
> o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
> node-stopper waits for the wal-segment-syncer stopping
> [18:46:28]W:             [org.apache.ignite:ignite-core] Thread 
> [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
> [18:46:28]W:             [org.apache.ignite:ignite-core]     Lock 
> [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> java.lang.Object.wait(Native Method)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> java.lang.Object.wait(Object.java:502)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         - locked 
> o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
> wal-segment-syncer waits until wal-write-worker flushes data:
> [18:46:28]W:             [org.apache.ignite:ignite-core] Thread 
> [name="wal-segment-syncer-#7782%wal.IgniteWalFlushBackgroundSelfTest1%", 
> id=9253, state=RUNNABLE, blockCnt=0, waitCnt=860657904]
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> sun.misc.Unsafe.park(Native Method)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushBuffer(FileWriteAheadLogManager.java:3455)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushAll(FileWriteAheadLogManager.java:3419)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:2704)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flushOrWait(FileWriteAheadLogManager.java:2696)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2776)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2538)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at 
> o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:820)
> And there are no wal-write-worker on the node as he is already interrupted:
> [18:45:34]W:             [org.apache.ignite:ignite-core] [2018-08-15 
> 15:45:34,132][ERROR][wal-write-worker%wal.IgniteWalFlushBackgroundSelfTest1-#7783%wal.IgniteWalFlushBackgroundSelfTest1%][IgniteTestRes
> ources] Critical system error detected. Will be handled accordingly to 
> configured handler [hnd=class o.a.i.failure.StopNodeFailureHandler, 
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i
> .pagemem.wal.StorageException: Failed to write buffer.]]
> Caused by: java.io.IOException: No space left on device (This exception is 
> generated intentionally by test logic)
> {code}
> As we don't have wal-write-worker   wal-segment-syncer will be waiting for 
> good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to