rob05c opened a new pull request, #7096:
URL: https://github.com/apache/trafficcontrol/pull/7096

   Adds parent health to the health client.
   
   This allows caches to log and markdown parent health based on whether the 
cache can directly request its parents, in addition to the existing Traffic 
Monitor health.
   
   Intra-datacenter network issues can cause a child cache to be unable to get 
to its parent, even though externally both are accessible to the Traffic 
Monitor. When this happens, we implicitly rely on the ATS parent health and 
markdown system, which is reactive and typically leads to client timeouts 
before a markdown and retry of a different parent can be completed. Direct 
parent health provides faster proactive markdown during these intra-datacenter 
network events.
   
   This also significantly refactors the health client. It was a single thread 
with a loop to reload config, refresh TO data, get TM health, and markdown. 
This makes all operations their own goroutine/microthread service, as the 
single-threaded work loop just wasn't feasible with the size of work for parent 
health polling.
   
   It adds 3 health mechanisms: L4 health (a TCP syn-ack-rst), L7 health (a 
successful HTTP response), and a meta-parent poll which polls the parent's own 
health client parent health and uses a heuristic of unavailable parents on the 
parent.
   
   All new parent health mechanisms default to disabled, and should be 
considered experimental.
   
   ## Which Traffic Control components are affected by this PR?
   - Traffic Control Health Client (tc-health-client)
   
   ## What is the best way to verify this PR?
   Enable parent health on the health client, observe logs and markdowns
   
   ## If this is a bugfix, which Traffic Control versions contained the bug?
   Not a bug fix
   
   
   ## PR submission checklist
   - [x] This PR has tests <!-- If not, please delete this text and explain why 
this PR does not need tests. -->
   - [x] This PR has documentation <!-- If not, please delete this text and 
explain why this PR does not need documentation. -->
   - [x] This PR has a CHANGELOG.md entry <!-- A fix for a bug from an ATC 
release, an improvement, or a new feature should have a changelog entry. -->
   - [x] This PR **DOES NOT FIX A SERIOUS SECURITY VULNERABILITY** (see [the 
Apache Software Foundation's security guidelines](https://apache.org/security) 
for details)
   
   <!--
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to