[ 
https://issues.apache.org/jira/browse/IGNITE-26209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Plekhanov updated IGNITE-26209:
---------------------------------------
    Ignite Flags: Release Notes Required  (was: Docs Required,Release Notes 
Required)

> Add metrics to improve node network unavailability detection
> ------------------------------------------------------------
>
>                 Key: IGNITE-26209
>                 URL: https://issues.apache.org/jira/browse/IGNITE-26209
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Aleksey Plekhanov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: ise
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Any metrics collection system gathers them discretely. In the interval 
> between collections, some metric may exceed its critical threshold values and 
> return to normal levels by the time of the next collection. For example, a 
> short-term network unavailability of a node can lead to a significant 
> increase in operation latency. The fact that there was an issue with this 
> particular node could be detected by observing an increase in the size of the 
> outgoing message queue for the TCP Communication SPI on that node. However, 
> if we collect metrics less frequently than the duration of the node's 
> downtime, such spikes might go unnoticed. It is necessary to have metrics 
> that would record bursts of accumulated message queues from 
> Discovery/Communication SPIs over a certain period of time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to