boglesby commented on pull request #7378: URL: https://github.com/apache/geode/pull/7378#issuecomment-1068616384
I'm not sure how to resolve the race condition you mention, but I see similar behavior with client/server connections. If a burst of connections is requested and none of those are made before the next load is received from the server, then the locator's load for that server gets reset back to zero. A burst of connections (10 in this case) causes the load to go from 0.0 to 0.012499998: ``` [warn 2022/03/15 14:38:37.905 PDT locator <locator request thread 1> tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200)<v1>:41001=LoadHolder[0.0, 192.168.1.5:51249, loadPollInterval=5000, 0.00125]} [warn 2022/03/15 14:38:37.906 PDT locator <locator request thread 1> tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.0 [warn 2022/03/15 14:38:37.907 PDT locator <locator request thread 1> tid=0x24] XXX LoadHolder.incConnections location=192.168.1.5:51249; load=0.00125 [warn 2022/03/15 14:38:37.907 PDT locator <locator request thread 1> tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection selectedServer=192.168.1.5:51249; loadAfterUpdate=0.00125 ... [warn 2022/03/15 14:38:38.005 PDT locator <locator request thread 1> tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200)<v1>:41001=LoadHolder[0.011249999, 192.168.1.5:51249, loadPollInterval=5000, 0.00125]} [warn 2022/03/15 14:38:38.005 PDT locator <locator request thread 1> tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.011249999 [warn 2022/03/15 14:38:38.005 PDT locator <locator request thread 1> tid=0x24] XXX LoadHolder.incConnections location=192.168.1.5:51249; load=0.012499998 [warn 2022/03/15 14:38:38.005 PDT locator <locator request thread 1> tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection selectedServer=192.168.1.5:51249; loadAfterUpdate=0.012499998 ``` If none of those connections are made before the next load is sent by that server, its load goes from 0.012499998 to 0.0: ``` [warn 2022/03/15 14:39:25.140 PDT locator <P2P message reader for 192.168.1.5(server1:30200)<v1>:41001 unshared ordered sender uid=5 dom #1 local port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateLoad about to update connectionLoadMap location=192.168.1.5:51249; load=0.0; loadPerConnection=0.00125 [warn 2022/03/15 14:39:25.140 PDT locator <P2P message reader for 192.168.1.5(server1:30200)<v1>:41001 unshared ordered sender uid=5 dom #1 local port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateMap location=192.168.1.5:51249; loadBeforeUpdate=0.012499998 [warn 2022/03/15 14:39:25.141 PDT locator <P2P message reader for 192.168.1.5(server1:30200)<v1>:41001 unshared ordered sender uid=5 dom #1 local port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateMap location=192.168.1.5:51249; loadAfterUpdate=0.0 [warn 2022/03/15 14:39:25.141 PDT locator <P2P message reader for 192.168.1.5(server1:30200)<v1>:41001 unshared ordered sender uid=5 dom #1 local port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateLoad done update connectionLoadMap location=192.168.1.5:51249 ``` The load for the next request starts is 0.0 again: ``` [warn 2022/03/15 14:39:33.475 PDT locator <locator request thread 2> tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200)<v1>:41001=LoadHolder[0.0, 192.168.1.5:51249, loadPollInterval=5000, 0.00125]} [warn 2022/03/15 14:39:33.475 PDT locator <locator request thread 2> tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.0 [warn 2022/03/15 14:39:33.475 PDT locator <locator request thread 2> tid=0x54] XXX LoadHolder.incConnections location=192.168.1.5:51249; load=0.00125 [warn 2022/03/15 14:39:33.475 PDT locator <locator request thread 2> tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection selectedServer=192.168.1.5:51249; loadAfterUpdate=0.00125 ... ``` One thing to note is that the load is only sent load-poll-interval (default=5 seconds) if it has changed. If it hasn't changed then it only gets sent every update frequency (which is 10 * 5 seconds by default). There is a boolean to control that frequency too: ``` private static final int FORCE_LOAD_UPDATE_FREQUENCY = getInteger( GeodeGlossary.GEMFIRE_PREFIX + "BridgeServer.FORCE_LOAD_UPDATE_FREQUENCY", 10); ``` The load-poll-interva is configurable, but currently only for the cache server not the gateway receiver. It probably wouldn't be too hard to add this support to gateway receiver. Also, there is a gfsh load-balance gateway-sender command that could help alleviate this condition. I'm still reviewing the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org