[
https://issues.apache.org/jira/browse/IGNITE-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Goncharuk updated IGNITE-11707:
--------------------------------------
Description:
I've stumbled across the following behavior on a large cluster with large
number of caches:
When several new nodes are being added to the cluster, a client node may hang
infinitely on join. On server nodes one can observe tcp discovery message
worker continuously processing metrics update messages and writing metrics to
socket. From the logs it was clear that the cluster generated a lot of metrics
update messages and a node could not cope with it.
Even when metrics update message is generated on coordinator, this scenario is
possible when message round-trip/processing time is compared to the metrics
update frequency.
To mitigate the issue, we should drop a not-yet-processed metrics update
message when a new metrics update message is received.
> Tcp Discovery should drop pending metrics update message when new message is
> received
> -------------------------------------------------------------------------------------
>
> Key: IGNITE-11707
> URL: https://issues.apache.org/jira/browse/IGNITE-11707
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexey Goncharuk
> Priority: Major
> Fix For: 2.8
>
>
> I've stumbled across the following behavior on a large cluster with large
> number of caches:
> When several new nodes are being added to the cluster, a client node may hang
> infinitely on join. On server nodes one can observe tcp discovery message
> worker continuously processing metrics update messages and writing metrics to
> socket. From the logs it was clear that the cluster generated a lot of
> metrics update messages and a node could not cope with it.
> Even when metrics update message is generated on coordinator, this scenario
> is possible when message round-trip/processing time is compared to the
> metrics update frequency.
> To mitigate the issue, we should drop a not-yet-processed metrics update
> message when a new metrics update message is received.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)