[
https://issues.apache.org/jira/browse/IGNITE-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Kosarev updated IGNITE-9135:
-----------------------------------
Description:
On High topology (about 200 servers/ 50 clients) we see often via jmx
(TcpDiscoverySpiMBean) high MessageWorkerQueueSize peaks (>100) in stable
cluster topology. Also very high number (about 250000) of ProcesedMessages,
ReceivedMessages for TcpDiscoveryStatusCheckMessage, whereas
TcpDiscoveryMetricsUpdateMessage is about 110000.
Actually
[org.apache.ignite.spi.discovery.tcp.ServerImpl.RingMessageWorker#metricsCheckFreq|https://github.com/apache/ignite/blob/d73211d2c1fc3897681a3abdc98c5eb383d24475/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/ServerImpl.java#L2628]
value does not depend on topology size:
private long metricsCheckFreq = 3 * spi.metricsUpdateFreq + 50;
Why dow we have such peaks on stable topology?
Consider change metricsCheckFreq formula to depend on topology size.
was:
On High topology (about 200 servers/ 50 clients) we see often via jmx
(TcpDiscoverySpiMBean) high MessageWorkerQueueSize peaks (>100) in stable
cluster topology. Also very high number (about 250000) of ProcesedMessages,
ReceivedMessages for TcpDiscoveryStatusCheckMessage, whereas
TcpDiscoveryMetricsUpdateMessage is about 110000.
it looks like
org.apache.ignite.spi.discovery.tcp.ServerImpl.RingMessageWorker#metricsCheckFreq
value does not depend on topology size:
private long metricsCheckFreq = 3 * spi.metricsUpdateFreq + 50;
Why dow we have such peaks on stable topology?
Consider change metricsCheckFreq formula to depend on topology size.
> TcpDiscovery - High Workload in Stable topology (MessageWorkerQueueSize peaks)
> ------------------------------------------------------------------------------
>
> Key: IGNITE-9135
> URL: https://issues.apache.org/jira/browse/IGNITE-9135
> Project: Ignite
> Issue Type: Bug
> Reporter: Sergey Kosarev
> Priority: Major
> Attachments: IMG_20180731_014146_HDR.jpg, IMG_20180731_015439_HDR.jpg
>
>
> On High topology (about 200 servers/ 50 clients) we see often via jmx
> (TcpDiscoverySpiMBean) high MessageWorkerQueueSize peaks (>100) in stable
> cluster topology. Also very high number (about 250000) of ProcesedMessages,
> ReceivedMessages for TcpDiscoveryStatusCheckMessage, whereas
> TcpDiscoveryMetricsUpdateMessage is about 110000.
> Actually
> [org.apache.ignite.spi.discovery.tcp.ServerImpl.RingMessageWorker#metricsCheckFreq|https://github.com/apache/ignite/blob/d73211d2c1fc3897681a3abdc98c5eb383d24475/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/ServerImpl.java#L2628]
> value does not depend on topology size:
> private long metricsCheckFreq = 3 * spi.metricsUpdateFreq + 50;
>
> Why dow we have such peaks on stable topology?
> Consider change metricsCheckFreq formula to depend on topology size.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)