[ 
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403144#comment-17403144
 ] 

Franco Po commented on IGNITE-15343:
------------------------------------

[~pvinokurov]

There are warning messages recorded in 2 server logs. You can download the full 
logs from above for details.
{code:java}
Client node considered as unreachable and will be dropped from cluster, because 
no metrics update messages received in interval: 
TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by network 
problems or long GC pause on client node, try to increase this parameter. 
[nodeId=b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
clientFailureDetectionTimeout=30000]
{code}

I don't think network problem is a contributing factor. And below JVM 
parameters are in place to reduce GC pause time.
{code}
-server -Xms4g -Xmx4g -XX:+AlwaysPreTouch -XX:+UseG1GC 
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -Djava.net.preferIPv4Stack=true
{code}

> NullPointerException occurs when restarting ignite client application
> ---------------------------------------------------------------------
>
>                 Key: IGNITE-15343
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15343
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Franco Po
>            Priority: Critical
>         Attachments: failed_startup-ignite_info.1st.attempt.log, 
> failed_startup-ignite_info.2nd.attempt.log, 
> server1-ignite_info.1st.attempt.log, server1-ignite_info.1st.attempt.log, 
> successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to 
> GridGain Community Edition 8.8.5 successfully in live environment a couple of 
> months ago. The entire setup is 2 instances of this ignite client application 
> plus a cluster of 2 ignite server instances. A planned maintenance needed to 
> restart the ignite client application. However, it couldn't be started again 
> due to a sequence of below exceptions (see 
> [^failed_startup-ignite_info.1st.attempt.log] and 
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
>  # java.io.IOException: Failed to get acknowledge for message: 
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage 
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
>  # java.net.SocketException: Socket is closed
>  # java.lang.NullPointerException: null
>  # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby 
> environment where the ignite server contains no active data (see 
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent 
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be 
> greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to