Alex, Did you measure the impact of metrics collection? What is the overhead you are trying to avoid?
Just to make it clear, MetricUpdateMessage-s are used as heartbeats. So they are sent anyways, even if no metrics are distributed between nodes. Denis вт, 4 дек. 2018 г. в 12:46, Alex Plehanov <plehanov.a...@gmail.com>: > Hi Igniters, > > In the current implementation, cache metrics are collected on each node and > sent across the whole cluster with discovery message > (TcpDiscoveryMetricsUpdateMessage) with configured frequency > (MetricsUpdateFrequency, 2 seconds by default) even if no one requested > them. > If there are a lot of caches and a lot of nodes in the cluster, metrics > update message (which contain each metric for each cache on each node) can > reach a critical size. > > Also frequently collecting all cache metrics have a negative performance > impact (some of them just get values from AtomicLong, but some of them need > an iteration over all cache partitions). > The only way now to disable cache metrics collecting and sending with > discovery message is to disable statistics for each cache. But this also > makes impossible to request some of cache metrics locally (for the current > node only). Requesting a limited set of cache metrics on the current node > doesn't have such performance impact as the frequent collecting of all > cache metrics, but sometimes it's enough for diagnostic purposes. > > As a workaround I have filled and implemented ticket [1], which introduces > new system property to disable cache metrics sending with > TcpDiscoveryMetricsUpdateMessage (in case this property is set, the message > will contain only node metrics). But system property is not good for a > permanent solution. Perhaps it's better to move such property to public API > (to IgniteConfiguration for example). > > Also maybe we should change cache metrics distributing strategy? For > example, collect metrics by request via communication SPI or subscribe to a > limited set of cache/metrics, etc. > > Thoughts? > > [1]: https://issues.apache.org/jira/browse/IGNITE-10172 >