boglesby commented on pull request #7378:
URL: https://github.com/apache/geode/pull/7378#issuecomment-1068616384


   I'm not sure how to resolve the race condition you mention, but I see 
similar behavior with client/server connections.
   
   If a burst of connections is requested and none of those are made before the 
next load is received from the server, then the locator's load for that server 
gets reset back to zero.
   
   A burst of connections (10 in this case) causes the load to go from 0.0 to 
0.012499998:
   ```
   [warn 2022/03/15 14:38:37.905 PDT locator <locator request thread 1> 
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200)<v1>:41001=LoadHolder[0.0,
 192.168.1.5:51249, loadPollInterval=5000, 0.00125]}
   
   [warn 2022/03/15 14:38:37.906 PDT locator <locator request thread 1> 
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.0
   
   [warn 2022/03/15 14:38:37.907 PDT locator <locator request thread 1> 
tid=0x24] XXX LoadHolder.incConnections location=192.168.1.5:51249; load=0.00125
   
   [warn 2022/03/15 14:38:37.907 PDT locator <locator request thread 1> 
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadAfterUpdate=0.00125
   
   ...
   
   [warn 2022/03/15 14:38:38.005 PDT locator <locator request thread 1> 
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200)<v1>:41001=LoadHolder[0.011249999,
 192.168.1.5:51249, loadPollInterval=5000, 0.00125]}
   
   [warn 2022/03/15 14:38:38.005 PDT locator <locator request thread 1> 
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.011249999
   
   [warn 2022/03/15 14:38:38.005 PDT locator <locator request thread 1> 
tid=0x24] XXX LoadHolder.incConnections location=192.168.1.5:51249; 
load=0.012499998
   
   [warn 2022/03/15 14:38:38.005 PDT locator <locator request thread 1> 
tid=0x24] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadAfterUpdate=0.012499998
   ```
   If none of those connections are made before the next load is sent by that 
server, its load goes from 0.012499998 to 0.0:
   ```
   [warn 2022/03/15 14:39:25.140 PDT locator <P2P message reader for 
192.168.1.5(server1:30200)<v1>:41001 unshared ordered sender uid=5 dom #1 local 
port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateLoad 
about to update connectionLoadMap location=192.168.1.5:51249; load=0.0; 
loadPerConnection=0.00125
   
   [warn 2022/03/15 14:39:25.140 PDT locator <P2P message reader for 
192.168.1.5(server1:30200)<v1>:41001 unshared ordered sender uid=5 dom #1 local 
port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateMap 
location=192.168.1.5:51249; loadBeforeUpdate=0.012499998
   
   [warn 2022/03/15 14:39:25.141 PDT locator <P2P message reader for 
192.168.1.5(server1:30200)<v1>:41001 unshared ordered sender uid=5 dom #1 local 
port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateMap 
location=192.168.1.5:51249; loadAfterUpdate=0.0
   
   [warn 2022/03/15 14:39:25.141 PDT locator <P2P message reader for 
192.168.1.5(server1:30200)<v1>:41001 unshared ordered sender uid=5 dom #1 local 
port=55139 remote port=51286> tid=0x56] XXX LocatorLoadSnapshot.updateLoad done 
update connectionLoadMap location=192.168.1.5:51249
   ```
   The load for the next request starts is 0.0 again:
   ```
   [warn 2022/03/15 14:39:33.475 PDT locator <locator request thread 2> 
tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection 
potentialServers={192.168.1.5:51249@192.168.1.5(server1:30200)<v1>:41001=LoadHolder[0.0,
 192.168.1.5:51249, loadPollInterval=5000, 0.00125]}
   
   [warn 2022/03/15 14:39:33.475 PDT locator <locator request thread 2> 
tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadBeforeUpdate=0.0
   
   [warn 2022/03/15 14:39:33.475 PDT locator <locator request thread 2> 
tid=0x54] XXX LoadHolder.incConnections location=192.168.1.5:51249; load=0.00125
   
   [warn 2022/03/15 14:39:33.475 PDT locator <locator request thread 2> 
tid=0x54] XXX LocatorLoadSnapshot.getServerForConnection 
selectedServer=192.168.1.5:51249; loadAfterUpdate=0.00125
   
   ...
   ```
   One thing to note is that the load is only sent load-poll-interval 
(default=5 seconds) if it has changed. If it hasn't changed then it only gets 
sent every update frequency (which is 10 * 5 seconds by default).
   
   There is a boolean to control that frequency too:
   ```
   private static final int FORCE_LOAD_UPDATE_FREQUENCY = getInteger(
     GeodeGlossary.GEMFIRE_PREFIX + "BridgeServer.FORCE_LOAD_UPDATE_FREQUENCY", 
10);
   ```
   The load-poll-interva is configurable, but currently only for the cache 
server not the gateway receiver. It probably wouldn't be too hard to add this 
support to gateway receiver.
   
   Also, there is a gfsh load-balance gateway-sender command that could help 
alleviate this condition.
   
   I'm still reviewing the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to