smalenfant opened a new issue #7710:
URL: https://github.com/apache/trafficserver/issues/7710


   After we upgraded from ATS 7.1.x to ATS 8.1.x, our backbone team went into 
several maintenances and noted impact on our CDN delivery. After the 3rd night 
of it, we finally found that something was wrong with the new installed code.
   
   ```
   # cat /opt/trafficserver/etc/trafficserver/records.config | grep 
connect_attempts_timeout
   CONFIG proxy.config.http.connect_attempts_timeout INT 2
   CONFIG proxy.config.http.parent_proxy.connect_attempts_timeout INT 2
   ```
   
   To make it short, the connect_attempts_timeout (at least on parent 
selection) or not observed under 2 scenarios:
   
   - If ICMP comes back with "No Route to Host". This might impact the 
timeouts. We've seen under ARP failures that the timeout could be from 3 to 6 
seconds (single to dual stack).
   - If a host can't connect (Example: iptables DROP on port 80), the timeouts 
observed were 60 seconds. 
   
   Here's the results for ATS7 (7.1.1.10 to be specific) - Expected results 
within 4 seconds (OK):
   
   ```
   $ curl -I 'http://cdn1cdedge0002.hidden.cdn1.coxlab.net/test.mpg' -s -o 
/dev/null -w "Connect: %{time_connect} DNS: %{time_namelookup}  TTFB: 
%{time_starttransfer} Total time: %{time_total}"
   Connect: 1.005 DNS: 0.004  TTFB: 3.984 Total time: 3.984
   
   [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) In 
ParentConfigParams::findParent(): parent_table: 0x1ac6050.
   [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) 
policy.ParentEnable: 1
   [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) Matched 
with 0x1be6c08 parent node from line 67
   [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) Matched 
with 0x1be6158 parent node from line 29
   [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) 
ParentConsistentHash::selectParent(): Using a consistent hash parent selection 
strategy.
   [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) Chosen 
parent: cdn1cdmid0003.coxlab.net.80
   [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) 
PARENT_SPECIFIED
   [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) Result 
for origin.rd.at.cox.net was parent cdn1cdmid0003.coxlab.net:80
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) 
Starting ParentConsistentHash::markParentDown()
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Parent 
fail count increased to 1 for cdn1cdmid0003.coxlab.net:80
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) 
ParentConfigParams::nextParent(): parent_table: 0x1ac6050, result->rec: 
0x1be6158
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) 
ParentConfigParams::nextParent(): result->r: 2, tablePtr: 0x1ac6050
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Calling 
selectParent() from nextParent
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) 
ParentConsistentHash::selectParent(): Using a consistent hash parent selection 
strategy.
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Chosen 
parent: cdn1cdmid0004.coxlab.net.80
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Retry 
result for origin.rd.at.cox.net was parent cdn1cdmid0004.coxlab.net:80
   ```
   
   Here's the results for ATS8 (8.1.2 to be specific) - Expected results within 
4 seconds (NOT OK - took 60 seconds):
   
   ```
   $ curl -I 'http://cdn1cdedge0004.hidden.cdn1.coxlab.net/test.mpg' -s -o 
/dev/null -w "Connect: %{time_connect} DNS: %{time_namelookup}  TTFB: 
%{time_starttransfer} Total time: %{time_total}"
   Connect: 0.005 DNS: 0.004  TTFB: 60.567 Total time: 60.567
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) In ParentConfigParams::findParent(): 
parent_table: 0x1934050.
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) policy.ParentEnable: 1
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) Matched with 0x1a7e140 parent node from line 
45
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) Matched with 0x1a7d498 parent node from line 
18
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using 
a consistent hash parent selection strategy.
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) Initial parent lookups: 1
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) Additional parent lookups: 1
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) Chosen parent: cdn1cdmid0003.coxlab.net.80
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) PARENT_SPECIFIED
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] 
[ET_NET 19] DEBUG: (parent_select) Result for origin.rd.at.cox.net was parent 
cdn1cdmid0003.coxlab.net:80
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) Parent fail count increased to 1 for 
cdn1cdmid0003.coxlab.net:80
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) ParentConfigParams::nextParent(): 
parent_table: 0x1934050, result->rec: 0x1a7d498
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) ParentConfigParams::nextParent(): result->r: 
2, tablePtr: 0x1934050
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) Calling selectParent() from nextParent
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a 
consistent hash parent selection strategy.
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) Initial parent lookups: 1
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) Additional parent lookups: 1
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) Chosen parent: cdn1cdmid0004.coxlab.net.80
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] 
[ET_NET 0] DEBUG: (parent_select) Retry result for origin.rd.at.cox.net was 
parent cdn1cdmid0004.coxlab.net:80
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) Parent fail count increased to 2 for 
cdn1cdmid0003.coxlab.net:80
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) ParentConfigParams::nextParent(): 
parent_table: 0x1934050, result->rec: 0x1a7d498
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) ParentConfigParams::nextParent(): result->r: 
2, tablePtr: 0x1934050
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) Calling selectParent() from nextParent
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using 
a consistent hash parent selection strategy.
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) Initial parent lookups: 1
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) Additional parent lookups: 1
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) Chosen parent: cdn1cdmid0004.coxlab.net.80
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] 
[ET_NET 19] DEBUG: (parent_select) Retry result for origin.rd.at.cox.net was 
parent cdn1cdmid0004.coxlab.net:80
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to