[
https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451986#comment-16451986
]
Alexey Goncharuk commented on IGNITE-7648:
------------------------------------------
[~ascherbakov], a few minor comments:
1) Since you've added exponential backoff before reconnect, please add backoff
timeout to the logging output
2) Please limit the maximum time to delay with some reasonable value, as far as
I can uderstand, the sleep may be too long if the number of attempts is large
3) I see a bunch of IgniteCachePutRetryTransactionalSelfTest failed on TC,
please make sure the failures are not related to your test - maybe it makes
sense to trigger a couple more runs of failover suite
[~ilyak], can you also take a look at the change?
> Revert IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
> --------------------------------------------------------
>
> Key: IGNITE-7648
> URL: https://issues.apache.org/jira/browse/IGNITE-7648
> Project: Ignite
> Issue Type: Improvement
> Affects Versions: 2.3
> Reporter: Alexei Scherbakov
> Assignee: Alexei Scherbakov
> Priority: Major
> Fix For: 2.6
>
>
> IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in
> IGNITE-5718 as a way to prevent unnecessary node drops in case of short
> network problems.
> I suppose it's wrong decision to fix it in such way.
> We had faced some issues in our production due to lack of automatic kicking
> of ill-behaving nodes (on example, hanging due to long GC pauses) until we
> realised the necessity of changing default behavior via property.
> Right solution is to kick nodes only if failure threshold is reached. Such
> behavior should be always enabled.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)