moseleymark edited a comment on issue #7471:
URL: https://github.com/apache/trafficserver/issues/7471#issuecomment-840919253


   I've been running 9.0.1 for about 24 hours and seeing this as well. The 
internal connection count seems to just tick up and up, but the connections 
don't seem to exist. Running traffic_ctl over and over, I do see it go *down* 
at times but the overall trend is to grow and grow.
   
   15867 is the pid of traffic_server
   
   > lsof -n -p 15867 | wc -l
   106540
   
   > traffic_ctl metric get proxy.process.net.connections_currently_open
   proxy.process.net.connections_currently_open 108760
   
   > traffic_ctl metric get proxy.process.http.current_server_connections
   proxy.process.http.current_server_connections 160043
   
   > traffic_ctl metric get proxy.process.current_server_connections
   proxy.process.current_server_connections 159993
   
   We don't use parents so this is normal...
   > traffic_ctl metric get proxy.process.http.current_parent_proxy_connections
   proxy.process.http.current_parent_proxy_connections 0
   
   > netstat -ant | wc -l
   25203
   
   > netstat -antp | grep 15867 | wc -l
   8640
   
   So somehow ATS thinks it has 4x the number of connections that the kernel 
thinks the entire box has and 12x the # of connections that the kernel thinks 
ATS has.
   
   I'm also seeing a ton more CLOSE_WAITs compared to 8.1.1:
   
   > netstat -ant | grep CLOSE | wc -l
   4462
   
   On the other boxes in the same 3-node cluster (servicing the same VIPs and 
same backend origin sites, so extremely similar workload):
   
   > netstat -ant | grep CLOSE_WAIT | wc -l
   39
   
   > netstat -ant | grep CLOSE_WAIT | wc -l
   148
   
   All my testing looked good (the throughput is in-line with the other 2 nodes 
in the cluster; sites load ok; etc), and I wouldn't have even noticed this 
except that we bumped into the connection limit:
   
   [May 13 10:01:50.185] [ACCEPT 0:8080] WARNING: too many connections, 
throttling.  connection_type=ACCEPT, current_connections=262149, 
net_connections_throttle=262144
   [May 13 10:11:50.188] [ACCEPT 0:8443] WARNING: too many connections, 
throttling.  connection_type=ACCEPT, current_connections=262169, 
net_connections_throttle=262144
   [May 13 10:21:50.197] [ACCEPT 0:8443] WARNING: too many connections, 
throttling.  connection_type=ACCEPT, current_connections=262263, 
net_connections_throttle=262144
   [May 13 10:31:50.198] [ACCEPT 0:8443] WARNING: too many connections, 
throttling.  connection_type=ACCEPT, current_connections=262208, 
net_connections_throttle=262144
   [May 13 10:41:50.199] [ACCEPT 0:8080] WARNING: too many connections, 
throttling.  connection_type=ACCEPT, current_connections=262147, 
net_connections_throttle=262144
   
   I raised proxy.config.net.connections_throttle but the current count seems 
to creep up and up. I was scouring the upgrade docs to see if I missed 
something (which could still be the case) and came upon this ticket, so I 
figured I'd add some info.
   
   No idea if it's related to this but in 9.0.1, I also see a steady of things 
like:
   
   20210513.20h26m36s BODY_FACTORY: suppressing 'connect#failed_connect' 
response for url 'X@'
   
   and
   
   20210513.20h26m41s BODY_FACTORY: suppressing 'connect#failed_connect' 
response for url ''
   
   My 8.1.1 nodes (same VIP, same workload, same backend origin sites) don't 
have a single equivalent log entry. They of course have entries for other URLs, 
but always for valid-ish looking URLs, never '' nor 'X@'. I've yet to find a 
corresponding log entry for the nginx instance that's fronting ATS. Again, I 
just mention this in the off chance that it's related to the above connection 
count issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to