[ 
https://issues.apache.org/jira/browse/IGNITE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Chugunov updated IGNITE-9040:
------------------------------------
    Description: 
When flag *IGNITE_WAL_LOG_TX_RECORDS* is set up special TxRecords are logged to 
WAL even on node stop.

With STOP segmentation policy *StopNodeFailureHandler* is used to stop the 
segmented node and it marks node's state as invalid. As a result all write 
requests to WAL get failed.

So as part of stop-on-segmentation procedure node needs to log Tx but it cannot 
as its state is marked as invalid. This leads to stop procedure finishing 
incorrectly, some threads started by the node are not cleaned up.

Exception example:
{noformat}
[2018-07-20 13:35:36,358][ERROR][node-stopper][ZookeeperDiscoverySpiTest0] 
Failed to pre-stop processor: GridProcessorAdapter []
class org.apache.ignite.IgniteException: Failed to log TxRecord: TxRecord 
[state=PREPARED, nearXidVer=GridCacheVersion [topVer=143562918, 
order=1532082921780, nodeOrder=3], writeVer=GridCacheVersion [topVer=143562918, 
order=1532082921781, nodeOrder=1], super=TimeStampRecord 
[timestamp=1532082936349]]
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1132)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:968)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onComplete(GridDhtTxPrepareFuture.java:983)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:717)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:105)
        at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
        at 
org.apache.ignite.internal.processors.cache.GridCacheMvccManager.cancelClientFutures(GridCacheMvccManager.java:425)
        at 
org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onStop(GridCacheMvccManager.java:410)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:984)
        at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2134)
        at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
        at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
        at 
org.apache.ignite.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.internal.pagemem.wal.StorageException: 
Failed to perform WAL operation (environment was invalidated by a previous 
error)
        at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkNode(FileWriteAheadLogManager.java:1504)
        at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$6100(FileWriteAheadLogManager.java:143)
        at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:2611)
        at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1500(FileWriteAheadLogManager.java:2521)
        at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:758)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1127)
        ... 15 more
{noformat}

  was:
When flag *IGNITE_WAL_LOG_TX_RECORDS* is set up special TxRecords are logged to 
WAL even on node stop.

With STOP segmentation policy *StopNodeFailureHandler* is used to stop the 
segmented node and it marks node's state as invalid. As a result all write 
requests to WAL get failed.

So as part of stop-on-segmentation procedure node needs to log Tx but it cannot 
as its state is marked as invalid. This leads to stop procedure finishing 
incorrectly, some threads started by the node are not cleaned up.


> StopNodeFailureHandler is not able to stop node correctly on node segmentation
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-9040
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9040
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.6
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Major
>             Fix For: 2.7
>
>
> When flag *IGNITE_WAL_LOG_TX_RECORDS* is set up special TxRecords are logged 
> to WAL even on node stop.
> With STOP segmentation policy *StopNodeFailureHandler* is used to stop the 
> segmented node and it marks node's state as invalid. As a result all write 
> requests to WAL get failed.
> So as part of stop-on-segmentation procedure node needs to log Tx but it 
> cannot as its state is marked as invalid. This leads to stop procedure 
> finishing incorrectly, some threads started by the node are not cleaned up.
> Exception example:
> {noformat}
> [2018-07-20 13:35:36,358][ERROR][node-stopper][ZookeeperDiscoverySpiTest0] 
> Failed to pre-stop processor: GridProcessorAdapter []
> class org.apache.ignite.IgniteException: Failed to log TxRecord: TxRecord 
> [state=PREPARED, nearXidVer=GridCacheVersion [topVer=143562918, 
> order=1532082921780, nodeOrder=3], writeVer=GridCacheVersion 
> [topVer=143562918, order=1532082921781, nodeOrder=1], super=TimeStampRecord 
> [timestamp=1532082936349]]
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1132)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:968)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onComplete(GridDhtTxPrepareFuture.java:983)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:717)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:105)
>       at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.cancelClientFutures(GridCacheMvccManager.java:425)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onStop(GridCacheMvccManager.java:410)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:984)
>       at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2134)
>       at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
>       at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
>       at 
> org.apache.ignite.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.internal.pagemem.wal.StorageException: 
> Failed to perform WAL operation (environment was invalidated by a previous 
> error)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkNode(FileWriteAheadLogManager.java:1504)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$6100(FileWriteAheadLogManager.java:143)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:2611)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1500(FileWriteAheadLogManager.java:2521)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:758)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1127)
>       ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to