[
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403144#comment-17403144
]
Franco Po commented on IGNITE-15343:
------------------------------------
[~pvinokurov]
There are warning messages recorded in 2 server logs. You can download the full
logs from above for details.
{code:java}
Client node considered as unreachable and will be dropped from cluster, because
no metrics update messages received in interval:
TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by network
problems or long GC pause on client node, try to increase this parameter.
[nodeId=b588bb65-6fe8-4aff-8f17-4a9e8733369b,
clientFailureDetectionTimeout=30000]
{code}
I don't think network problem is a contributing factor. And below JVM
parameters are in place to reduce GC pause time.
{code}
-server -Xms4g -Xmx4g -XX:+AlwaysPreTouch -XX:+UseG1GC
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -Djava.net.preferIPv4Stack=true
{code}
> NullPointerException occurs when restarting ignite client application
> ---------------------------------------------------------------------
>
> Key: IGNITE-15343
> URL: https://issues.apache.org/jira/browse/IGNITE-15343
> Project: Ignite
> Issue Type: Bug
> Reporter: Franco Po
> Priority: Critical
> Attachments: failed_startup-ignite_info.1st.attempt.log,
> failed_startup-ignite_info.2nd.attempt.log,
> server1-ignite_info.1st.attempt.log, server1-ignite_info.1st.attempt.log,
> successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to
> GridGain Community Edition 8.8.5 successfully in live environment a couple of
> months ago. The entire setup is 2 instances of this ignite client application
> plus a cluster of 2 ignite server instances. A planned maintenance needed to
> restart the ignite client application. However, it couldn't be started again
> due to a sequence of below exceptions (see
> [^failed_startup-ignite_info.1st.attempt.log] and
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
> # java.io.IOException: Failed to get acknowledge for message:
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b,
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
> # java.net.SocketException: Socket is closed
> # java.lang.NullPointerException: null
> # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby
> environment where the ignite server contains no active data (see
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be
> greatly appreciated.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)