[jira] [Comment Edited] (IGNITE-8657) Simultaneous start of bunch of client nodes may lead to some clients hangs

Sergey Chugunov (JIRA) Fri, 08 Jun 2018 09:32:06 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506233#comment-16506233
 ]


Sergey Chugunov edited comment on IGNITE-8657 at 6/8/18 4:31 PM:
-----------------------------------------------------------------

[~agoncharuk],

Good catch, thanks for spotting this!

I reviewed the code and found out that assertion was caused by quite unusual 
property *forceServerMode*.
The problem with it was that ClusterNode#isClient for such client returns false 
when SinglePartitionMessage sent from this client returns true.

I'm not sure if we need to force reconnecting of such clients so I changed 
implementation in such was that we don't force them to reconnect. After that 
test started passing on TC so I think this logic works.

What do you think?


was (Author: sergey-chugunov):
[~agoncharuk],

Good catch, thanks for spotting this!

I reviewed the code and found out that assertion was caused by quite unusual 
property forceServerMode.
The problem with it was that ClusterNode#isClient for such client returns false 
when SinglePartitionMessage sent from this client returns true.

I'm not sure if we need to force reconnecting of such clients so I changed 
implementation in such was that we don't force them to reconnect. After that 
test started passing on TC so I think this logic works.

What do you think?

> Simultaneous start of bunch of client nodes may lead to some clients hangs
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-8657
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8657
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.5
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Major
>             Fix For: 2.6
>
>
> h3. Description
> PartitionExchangeManager uses a system property 
> *IGNITE_EXCHANGE_HISTORY_SIZE* to manage max number of exchange objects and 
> optimize memory consumption.
> Default value of the property is 1000 but in scenarios with many caches and 
> partitions it is reasonable to set exchange history size to a smaller values 
> around few dozens.
> Then if user starts up at once more client nodes than history size some 
> clients may hang because their exchange information was preempted and no 
> longer available.
> h3. Workarounds
> Two workarounds are possible: 
> * Do not start at once more clients than history size.
> * Restart hanging client node.
> h3. Solution
> Forcing client node to reconnect when server detected loosing its exchange 
> information prevents client nodes hanging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (IGNITE-8657) Simultaneous start of bunch of client nodes may lead to some clients hangs

Reply via email to