[ 
https://issues.apache.org/jira/browse/IGNITE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550827#comment-16550827
 ] 

ASF GitHub Bot commented on IGNITE-9040:
----------------------------------------

GitHub user sergey-chugunov-1985 opened a pull request:

    https://github.com/apache/ignite/pull/4395

    IGNITE-9040 new FailureHandler for node segmentation special case, test for 
the root cause error

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-9040

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/4395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4395
    
----
commit 1a76eab29100002f7b8925c051c76763e87511d4
Author: Sergey Chugunov <sergey.chugunov@...>
Date:   2018-07-20T14:27:05Z

    IGNITE-9040 new FailureHandler for node segmentation special case, test for 
the root cause error

----


> StopNodeFailureHandler is not able to stop node correctly on node segmentation
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-9040
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9040
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.6
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Major
>             Fix For: 2.7
>
>
> When flag *IGNITE_WAL_LOG_TX_RECORDS* is set up special TxRecords are logged 
> to WAL even on node stop.
> With STOP segmentation policy *StopNodeFailureHandler* is used to stop the 
> segmented node and it marks node's state as invalid. As a result all write 
> requests to WAL get failed.
> So as part of stop-on-segmentation procedure node needs to log Tx but it 
> cannot as its state is marked as invalid. This leads to stop procedure 
> finishing incorrectly, some threads started by the node are not cleaned up.
> Exception example:
> {noformat}
> [2018-07-20 13:35:36,358][ERROR][node-stopper][ZookeeperDiscoverySpiTest0] 
> Failed to pre-stop processor: GridProcessorAdapter []
> class org.apache.ignite.IgniteException: Failed to log TxRecord: TxRecord 
> [state=PREPARED, nearXidVer=GridCacheVersion [topVer=143562918, 
> order=1532082921780, nodeOrder=3], writeVer=GridCacheVersion 
> [topVer=143562918, order=1532082921781, nodeOrder=1], super=TimeStampRecord 
> [timestamp=1532082936349]]
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1132)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:968)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onComplete(GridDhtTxPrepareFuture.java:983)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:717)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:105)
>       at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.cancelClientFutures(GridCacheMvccManager.java:425)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onStop(GridCacheMvccManager.java:410)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:984)
>       at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2134)
>       at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
>       at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
>       at 
> org.apache.ignite.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.internal.pagemem.wal.StorageException: 
> Failed to perform WAL operation (environment was invalidated by a previous 
> error)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkNode(FileWriteAheadLogManager.java:1504)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$6100(FileWriteAheadLogManager.java:143)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:2611)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1500(FileWriteAheadLogManager.java:2521)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:758)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1127)
>       ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to