[jira] [Commented] (IGNITE-7648) Revert IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

Igor Seliverstov (JIRA) Mon, 19 Feb 2018 09:07:27 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369323#comment-16369323
 ]


Igor Seliverstov commented on IGNITE-7648:
------------------------------------------

[~ascherbakov], the code looks OK, but I'd change 
{code:java}
long delay = failureDetectionTimeoutEnabled() ? failureDetectionTimeout() / 
reconCnt :
   connTimeout0 - (U.currentTimeMillis() - start);
{code}
To something like:
{code:java}
long delay = failureDetectionTimeoutEnabled() ? 
timeoutHelper.remainingTime(U.currentTimeMillis()) / (reconCnt - attempt) :
   connTimeout0 - (U.currentTimeMillis() - start);{code}
In 
{{org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi#createTcpClient}}

Also I'm not sure is it a good idea to enable force kill by default.

Lets consider next example:

We successfully joined the topology but due to some local issue cannot open a 
direct connection to any node via Communication SPI.

This way using your approach we will kill each node we try to send a message to.

Even in current shape IGNITE_ENABLE_FORCIBLE_NODE_KILL doesn't look like a 
production feature and, in my opinion, cannot be used by default.

 

> Revert IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
> --------------------------------------------------------
>
>                 Key: IGNITE-7648
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7648
>             Project: Ignite
>          Issue Type: Improvement
>    Affects Versions: 2.3
>            Reporter: Alexei Scherbakov
>            Assignee: Alexei Scherbakov
>            Priority: Major
>             Fix For: 2.5
>
>
> IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in 
> IGNITE-5718 as a way to prevent unnecessary node drops in case of short 
> network problems.
> I suppose it's wrong decision to fix it in such way.
> We had faced some issues in our production due to lack of automatic kicking 
> of ill-behaving nodes (on example, hanging due to long GC pauses) until we 
> realised the necessity of changing default behavior via property.
> Right solution is to kick nodes only if failure threshold is reached. Such 
> behavior should be always enabled.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-7648) Revert IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

Reply via email to