rob05c opened a new pull request, #7096: URL: https://github.com/apache/trafficcontrol/pull/7096
Adds parent health to the health client. This allows caches to log and markdown parent health based on whether the cache can directly request its parents, in addition to the existing Traffic Monitor health. Intra-datacenter network issues can cause a child cache to be unable to get to its parent, even though externally both are accessible to the Traffic Monitor. When this happens, we implicitly rely on the ATS parent health and markdown system, which is reactive and typically leads to client timeouts before a markdown and retry of a different parent can be completed. Direct parent health provides faster proactive markdown during these intra-datacenter network events. This also significantly refactors the health client. It was a single thread with a loop to reload config, refresh TO data, get TM health, and markdown. This makes all operations their own goroutine/microthread service, as the single-threaded work loop just wasn't feasible with the size of work for parent health polling. It adds 3 health mechanisms: L4 health (a TCP syn-ack-rst), L7 health (a successful HTTP response), and a meta-parent poll which polls the parent's own health client parent health and uses a heuristic of unavailable parents on the parent. All new parent health mechanisms default to disabled, and should be considered experimental. ## Which Traffic Control components are affected by this PR? - Traffic Control Health Client (tc-health-client) ## What is the best way to verify this PR? Enable parent health on the health client, observe logs and markdowns ## If this is a bugfix, which Traffic Control versions contained the bug? Not a bug fix ## PR submission checklist - [x] This PR has tests <!-- If not, please delete this text and explain why this PR does not need tests. --> - [x] This PR has documentation <!-- If not, please delete this text and explain why this PR does not need documentation. --> - [x] This PR has a CHANGELOG.md entry <!-- A fix for a bug from an ATC release, an improvement, or a new feature should have a changelog entry. --> - [x] This PR **DOES NOT FIX A SERIOUS SECURITY VULNERABILITY** (see [the Apache Software Foundation's security guidelines](https://apache.org/security) for details) <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
