[ 
https://issues.apache.org/jira/browse/IGNITE-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-11707:
--------------------------------------
    Description: 
I've stumbled across the following behavior on a large cluster with large 
number of caches:
When several new nodes are being added to the cluster, a client node may hang 
infinitely on join. On server nodes one can observe tcp discovery message 
worker continuously processing metrics update messages and writing metrics to 
socket. From the logs it was clear that the cluster generated a lot of metrics 
update messages and a node could not cope with it. 
Even when metrics update message is generated on coordinator, this scenario is 
possible when message round-trip/processing time is compared to the metrics 
update frequency.

To mitigate the issue, we should drop a not-yet-processed metrics update 
message when a new metrics update message is received.

> Tcp Discovery should drop pending metrics update message when new message is 
> received
> -------------------------------------------------------------------------------------
>
>                 Key: IGNITE-11707
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11707
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Goncharuk
>            Priority: Major
>             Fix For: 2.8
>
>
> I've stumbled across the following behavior on a large cluster with large 
> number of caches:
> When several new nodes are being added to the cluster, a client node may hang 
> infinitely on join. On server nodes one can observe tcp discovery message 
> worker continuously processing metrics update messages and writing metrics to 
> socket. From the logs it was clear that the cluster generated a lot of 
> metrics update messages and a node could not cope with it. 
> Even when metrics update message is generated on coordinator, this scenario 
> is possible when message round-trip/processing time is compared to the 
> metrics update frequency.
> To mitigate the issue, we should drop a not-yet-processed metrics update 
> message when a new metrics update message is received.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to