[
https://issues.apache.org/jira/browse/IGNITE-26209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksey Plekhanov updated IGNITE-26209:
---------------------------------------
Ignite Flags: Release Notes Required (was: Docs Required,Release Notes
Required)
> Add metrics to improve node network unavailability detection
> ------------------------------------------------------------
>
> Key: IGNITE-26209
> URL: https://issues.apache.org/jira/browse/IGNITE-26209
> Project: Ignite
> Issue Type: Improvement
> Reporter: Aleksey Plekhanov
> Assignee: Aleksey Plekhanov
> Priority: Major
> Labels: ise
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Any metrics collection system gathers them discretely. In the interval
> between collections, some metric may exceed its critical threshold values and
> return to normal levels by the time of the next collection. For example, a
> short-term network unavailability of a node can lead to a significant
> increase in operation latency. The fact that there was an issue with this
> particular node could be detected by observing an increase in the size of the
> outgoing message queue for the TCP Communication SPI on that node. However,
> if we collect metrics less frequently than the duration of the node's
> downtime, such spikes might go unnoticed. It is necessary to have metrics
> that would record bursts of accumulated message queues from
> Discovery/Communication SPIs over a certain period of time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)