hi, Alex. imo: 1. metrics through discovery require refactoring. 2. local cache metrics should be available (if configured) on each node. 3. there must be an opportunity to configure metrics in runtime.
thanks. > > >Hi Igniters, > >In the current implementation, cache metrics are collected on each node and >sent across the whole cluster with discovery message >(TcpDiscoveryMetricsUpdateMessage) with configured frequency >(MetricsUpdateFrequency, 2 seconds by default) even if no one requested >them. >If there are a lot of caches and a lot of nodes in the cluster, metrics >update message (which contain each metric for each cache on each node) can >reach a critical size. > >Also frequently collecting all cache metrics have a negative performance >impact (some of them just get values from AtomicLong, but some of them need >an iteration over all cache partitions). >The only way now to disable cache metrics collecting and sending with >discovery message is to disable statistics for each cache. But this also >makes impossible to request some of cache metrics locally (for the current >node only). Requesting a limited set of cache metrics on the current node >doesn't have such performance impact as the frequent collecting of all >cache metrics, but sometimes it's enough for diagnostic purposes. > >As a workaround I have filled and implemented ticket [1], which introduces >new system property to disable cache metrics sending with >TcpDiscoveryMetricsUpdateMessage (in case this property is set, the message >will contain only node metrics). But system property is not good for a >permanent solution. Perhaps it's better to move such property to public API >(to IgniteConfiguration for example). > >Also maybe we should change cache metrics distributing strategy? For >example, collect metrics by request via communication SPI or subscribe to a >limited set of cache/metrics, etc. > >Thoughts? > >[1]: https://issues.apache.org/jira/browse/IGNITE-10172 -- Zhenya Stanilovsky