[
https://issues.apache.org/jira/browse/IGNITE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550827#comment-16550827
]
ASF GitHub Bot commented on IGNITE-9040:
----------------------------------------
GitHub user sergey-chugunov-1985 opened a pull request:
https://github.com/apache/ignite/pull/4395
IGNITE-9040 new FailureHandler for node segmentation special case, test for
the root cause error
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gridgain/apache-ignite ignite-9040
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/ignite/pull/4395.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4395
----
commit 1a76eab29100002f7b8925c051c76763e87511d4
Author: Sergey Chugunov <sergey.chugunov@...>
Date: 2018-07-20T14:27:05Z
IGNITE-9040 new FailureHandler for node segmentation special case, test for
the root cause error
----
> StopNodeFailureHandler is not able to stop node correctly on node segmentation
> ------------------------------------------------------------------------------
>
> Key: IGNITE-9040
> URL: https://issues.apache.org/jira/browse/IGNITE-9040
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.6
> Reporter: Sergey Chugunov
> Assignee: Sergey Chugunov
> Priority: Major
> Fix For: 2.7
>
>
> When flag *IGNITE_WAL_LOG_TX_RECORDS* is set up special TxRecords are logged
> to WAL even on node stop.
> With STOP segmentation policy *StopNodeFailureHandler* is used to stop the
> segmented node and it marks node's state as invalid. As a result all write
> requests to WAL get failed.
> So as part of stop-on-segmentation procedure node needs to log Tx but it
> cannot as its state is marked as invalid. This leads to stop procedure
> finishing incorrectly, some threads started by the node are not cleaned up.
> Exception example:
> {noformat}
> [2018-07-20 13:35:36,358][ERROR][node-stopper][ZookeeperDiscoverySpiTest0]
> Failed to pre-stop processor: GridProcessorAdapter []
> class org.apache.ignite.IgniteException: Failed to log TxRecord: TxRecord
> [state=PREPARED, nearXidVer=GridCacheVersion [topVer=143562918,
> order=1532082921780, nodeOrder=3], writeVer=GridCacheVersion
> [topVer=143562918, order=1532082921781, nodeOrder=1], super=TimeStampRecord
> [timestamp=1532082936349]]
> at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1132)
> at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:968)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onComplete(GridDhtTxPrepareFuture.java:983)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:717)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:105)
> at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.cancelClientFutures(GridCacheMvccManager.java:425)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onStop(GridCacheMvccManager.java:410)
> at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:984)
> at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2134)
> at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
> at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
> at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
> at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
> at
> org.apache.ignite.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.internal.pagemem.wal.StorageException:
> Failed to perform WAL operation (environment was invalidated by a previous
> error)
> at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkNode(FileWriteAheadLogManager.java:1504)
> at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$6100(FileWriteAheadLogManager.java:143)
> at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:2611)
> at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1500(FileWriteAheadLogManager.java:2521)
> at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:758)
> at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1127)
> ... 15 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)