[
https://issues.apache.org/jira/browse/IGNITE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343343#comment-16343343
]
ASF GitHub Bot commented on IGNITE-7476:
----------------------------------------
GitHub user alamar opened a pull request:
https://github.com/apache/ignite/pull/3448
IGNITE-7476 IGNITE-7519 needed for reproducer of IGNITE-7540
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gridgain/apache-ignite ignite-7476-7519
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/ignite/pull/3448.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3448
----
commit 246239105c7ca5b835ed1767f4f6be667f374ccc
Author: Ilya Kasnacheev <ilya.kasnacheev@...>
Date: 2018-01-29T13:18:21Z
IGNITE-7519 Avoid suppression of exceptions by IsolatedUpdater in data
streamer.
commit 83ebbe5630cca6b40e96d61637ba4e5058b3c253
Author: Ilya Kasnacheev <ilya.kasnacheev@...>
Date: 2018-01-29T13:20:01Z
IGNITE-7476 Avoid NPE during metrics gathering leading to discovery thread
failure.
----
> Server node will join with failure gathering metrics
> ----------------------------------------------------
>
> Key: IGNITE-7476
> URL: https://issues.apache.org/jira/browse/IGNITE-7476
> Project: Ignite
> Issue Type: Bug
> Reporter: Ilya Kasnacheev
> Priority: Critical
>
> Sometimes server node will fail with the following trace:
> {code:java}
> SEVERE: TcpDiscoverSpi's message worker thread failed abnormally. Stopping
> the node in order to prevent cluster wide instability.
> java.lang.NullPointerException
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1149)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5022)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2690)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2491)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6675)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2574)
> at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62){code}
> Two problems here:
> * Uncaught exception in cacheMetrics() leads to unconditional failure of
> node, because it happens to be in discovery thread. Should probably wrap all
> non-trivial code include try-catch.
> * Lack of proper locking when destroying cache (see also IGNITE-6580,
> IGNITE-7278 and IGNITE-7165)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)