Copilot commented on code in PR #13092:
URL: https://github.com/apache/trafficserver/pull/13092#discussion_r3108375908


##########
include/iocore/hostdb/HostDBProcessor.h:
##########
@@ -123,61 +123,76 @@ enum class HostDBType : uint8_t {
 };
 
 /** Information about a single target.
+ *
+ * Each instance tracks the health state of one upstream address. The state is 
derived from @c last_failure and the caller-supplied
+ * @a fail_window:
+ *
+ * | State   | Description                                                     
                 |
+ * 
|---------|----------------------------------------------------------------------------------|
+ * | Up      | No known failure; eligible for normal selection.                
                 |
+ * | Down    | Blocked; no connections permitted until @c last_failure + @a 
fail_window elapses |
+ * | Suspect | Fail window has elapsed; connections are permitted.             
                 |
+ * |         | On success transitions to Up (@c mark_up); on failure returns 
to Down.           |
+ *

Review Comment:
   The HostDBInfo class-level comment still references `last_failure`, but that 
member no longer exists (and the state is now derived via `last_fail_time()` / 
`_last_failure`). This can cause Doxygen warnings and is confusing for readers; 
please update the text/table/chart labels accordingly.



##########
src/iocore/hostdb/unit_tests/test_HostDBInfo.cc:
##########
@@ -0,0 +1,214 @@
+/** @file
+
+  Unit tests for HostDBInfo state transitions (UP / DOWN / SUSPECT).
+
+  @section license License
+
+    Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+ */
+
+#include <catch2/catch_test_macros.hpp>
+
+#include "iocore/hostdb/HostDBProcessor.h"
+#include "tscore/ink_time.h"
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+// fail_window used throughout the tests (matches down_server.cache_time 
default).

Review Comment:
   The FAIL_WINDOW constant is set to 300s, but the comment says it matches the 
`proxy.config.http.down_server.cache_time` default. In the current records 
config the default is 60s, so either adjust the constant or change the comment 
to avoid misinformation.
   ```suggestion
   // fail_window used throughout the tests.
   ```



##########
src/iocore/hostdb/HostDBInfo.cc:
##########
@@ -95,3 +97,154 @@ HostDBInfo::srvname() const
 {
   return data.srv.srv_offset ? reinterpret_cast<char const *>(this) + 
data.srv.srv_offset : nullptr;
 }
+
+HostDBInfo &
+HostDBInfo::operator=(HostDBInfo const &that)
+{
+  if (this != &that) {
+    memcpy(static_cast<void *>(this), static_cast<const void *>(&that), 
sizeof(*this));
+  }
+  return *this;
+}
+
+ts_time
+HostDBInfo::last_fail_time() const
+{
+  return _last_failure;
+}
+
+uint8_t
+HostDBInfo::fail_count() const
+{
+  return _fail_count;
+}
+
+HostDBInfo::State
+HostDBInfo::state(ts_time now, ts_seconds fail_window) const
+{
+  auto last_fail = this->last_fail_time();
+  if (last_fail == TS_TIME_ZERO) {
+    return State::UP;
+  }
+
+  if (now <= last_fail + fail_window) {
+    return State::DOWN;
+  } else {
+    return State::SUSPECT;
+  }
+}
+
+bool
+HostDBInfo::is_up() const
+{
+  return this->last_fail_time() == TS_TIME_ZERO;
+}
+
+bool
+HostDBInfo::is_down(ts_time now, ts_seconds fail_window) const
+{
+  return this->state(now, fail_window) == State::DOWN;
+}
+
+bool
+HostDBInfo::is_suspect(ts_time now, ts_seconds fail_window) const
+{
+  return this->state(now, fail_window) == State::SUSPECT;
+}
+
+/** Mark the target as UP
+ *
+ * @return @c true if the target was previously DOWN or SUSPECT (i.e., a state 
change occurred).
+ */
+bool
+HostDBInfo::mark_up()
+{
+  auto t        = _last_failure.exchange(TS_TIME_ZERO);
+  bool was_down = t != TS_TIME_ZERO;
+  if (was_down) {
+    _fail_count.store(0);
+  }
+  return was_down;

Review Comment:
   `mark_up()` only resets `_fail_count` when `_last_failure` was non-zero. 
However, `mark_up()` is called on successful connects even when the target was 
already UP; in that case any accumulated `_fail_count` from earlier 
intermittent failures will not be cleared, so non-consecutive failures can 
eventually trip the `connect_attempts_rr_retries` threshold and mark the host 
DOWN unexpectedly. Consider resetting `_fail_count` on every successful 
`mark_up()` (even if it returns false / no state change), or otherwise defining 
and documenting that fail_count is intended to be cumulative across successes.



##########
src/iocore/hostdb/unit_tests/test_HostDBInfo.cc:
##########
@@ -0,0 +1,214 @@
+/** @file
+
+  Unit tests for HostDBInfo state transitions (UP / DOWN / SUSPECT).
+
+  @section license License
+
+    Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+ */
+
+#include <catch2/catch_test_macros.hpp>
+
+#include "iocore/hostdb/HostDBProcessor.h"
+#include "tscore/ink_time.h"
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+// fail_window used throughout the tests (matches down_server.cache_time 
default).
+static const ts_seconds FAIL_WINDOW{300};
+
+static const ts_time T0 = ts_clock::now(); ///< A fixed anchor in time;
+
+static const ts_time T1 = T0 + ts_seconds(1);               ///< A time within 
FAIL_WINDOW from T0
+static const ts_time T2 = T0 + FAIL_WINDOW + ts_seconds(1); ///< A time past 
FAIL_WINDOW from T0
+
+static const ts_time T3 = T2 + ts_seconds(1);               ///< A time within 
FAIL_WINDOW from T2
+static const ts_time T4 = T3 + FAIL_WINDOW + ts_seconds(1); ///< A time past 
FAIL_WINDOW from T

Review Comment:
   Minor comment typo: "past FAIL_WINDOW from T" looks truncated (likely meant 
"from T3").
   ```suggestion
   static const ts_time T4 = T3 + FAIL_WINDOW + ts_seconds(1); ///< A time past 
FAIL_WINDOW from T3
   ```



##########
src/iocore/hostdb/unit_tests/test_HostDBInfo.cc:
##########
@@ -0,0 +1,214 @@
+/** @file
+
+  Unit tests for HostDBInfo state transitions (UP / DOWN / SUSPECT).
+
+  @section license License
+
+    Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+ */
+
+#include <catch2/catch_test_macros.hpp>
+
+#include "iocore/hostdb/HostDBProcessor.h"
+#include "tscore/ink_time.h"
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+// fail_window used throughout the tests (matches down_server.cache_time 
default).
+static const ts_seconds FAIL_WINDOW{300};
+
+static const ts_time T0 = ts_clock::now(); ///< A fixed anchor in time;
+
+static const ts_time T1 = T0 + ts_seconds(1);               ///< A time within 
FAIL_WINDOW from T0
+static const ts_time T2 = T0 + FAIL_WINDOW + ts_seconds(1); ///< A time past 
FAIL_WINDOW from T0
+
+static const ts_time T3 = T2 + ts_seconds(1);               ///< A time within 
FAIL_WINDOW from T2
+static const ts_time T4 = T3 + FAIL_WINDOW + ts_seconds(1); ///< A time past 
FAIL_WINDOW from T
+// ---------------------------------------------------------------------------
+// Tests
+// ---------------------------------------------------------------------------
+
+TEST_CASE("HostDBInfo state", "[hostdb]")
+{
+  SECTION("initial state is UP")
+  {
+    HostDBInfo info;
+    REQUIRE(info.state(T0, FAIL_WINDOW) == HostDBInfo::State::UP);

Review Comment:
   PR description says the new unit tests cover "concurrent CAS races", but 
this test file appears to be purely single-threaded (no concurrent 
mark_down/mark_up/increment_fail_count races are exercised). Either add a 
concurrency-focused test section (e.g., multiple threads contending on 
mark_down/increment_fail_count) or adjust the PR description to match what’s 
actually covered.



##########
src/iocore/hostdb/HostDBInfo.cc:
##########
@@ -95,3 +97,154 @@ HostDBInfo::srvname() const
 {
   return data.srv.srv_offset ? reinterpret_cast<char const *>(this) + 
data.srv.srv_offset : nullptr;
 }
+
+HostDBInfo &
+HostDBInfo::operator=(HostDBInfo const &that)
+{
+  if (this != &that) {
+    memcpy(static_cast<void *>(this), static_cast<const void *>(&that), 
sizeof(*this));
+  }
+  return *this;
+}
+
+ts_time
+HostDBInfo::last_fail_time() const
+{
+  return _last_failure;
+}
+
+uint8_t
+HostDBInfo::fail_count() const
+{
+  return _fail_count;
+}
+
+HostDBInfo::State
+HostDBInfo::state(ts_time now, ts_seconds fail_window) const
+{
+  auto last_fail = this->last_fail_time();
+  if (last_fail == TS_TIME_ZERO) {
+    return State::UP;
+  }
+
+  if (now <= last_fail + fail_window) {
+    return State::DOWN;
+  } else {
+    return State::SUSPECT;
+  }
+}
+
+bool
+HostDBInfo::is_up() const
+{
+  return this->last_fail_time() == TS_TIME_ZERO;
+}
+
+bool
+HostDBInfo::is_down(ts_time now, ts_seconds fail_window) const
+{
+  return this->state(now, fail_window) == State::DOWN;
+}
+
+bool
+HostDBInfo::is_suspect(ts_time now, ts_seconds fail_window) const
+{
+  return this->state(now, fail_window) == State::SUSPECT;
+}
+
+/** Mark the target as UP
+ *
+ * @return @c true if the target was previously DOWN or SUSPECT (i.e., a state 
change occurred).
+ */
+bool
+HostDBInfo::mark_up()
+{
+  auto t        = _last_failure.exchange(TS_TIME_ZERO);
+  bool was_down = t != TS_TIME_ZERO;
+  if (was_down) {
+    _fail_count.store(0);
+  }
+  return was_down;
+}
+
+/** Mark the entry as DOWN.
+ *
+ * @param now         Time of the failure.
+ * @param fail_window The fail window duration 
(proxy.config.http.down_server.cache_time).
+ * @return @c true if @a this was marked down, @c false if not.
+ *
+ * Handles two transitions:
+ * - UP → DOWN: @c _last_failure is @c TS_TIME_ZERO; set via CAS.
+ * - SUSPECT → DOWN: @c fail_window has elapsed since the last failure; @c 
_last_failure is
+ *   refreshed via CAS to restart the fail window.
+ *
+ * On a successful transition @c _fail_count is reset to zero so that the next 
SUSPECT window
+ * accumulates failures from a clean baseline.
+ *
+ * Returns @c false if the server is already DOWN (within the active fail 
window), so the
+ * fail window is not refreshed by concurrent failures.
+ */
+bool
+HostDBInfo::mark_down(ts_time now, ts_seconds fail_window)
+{
+  // UP -> DOWN
+  auto t0{TS_TIME_ZERO};
+  if (_last_failure.compare_exchange_strong(t0, now)) {
+    // Reset so the next SUSPECT window starts with a fresh failure count.
+    _fail_count.store(0);
+    return true;
+  }
+
+  // After the failed CAS, t0 holds the current _last_failure value.
+  // SUSPECT -> DOWN: the fail window has elapsed; refresh _last_failure to 
restart it.
+  if (t0 + fail_window < now) {
+    if (_last_failure.compare_exchange_strong(t0, now)) {
+      // Reset so the next SUSPECT window starts with a fresh failure count.
+      _fail_count.store(0);
+      return true;
+    }
+  }
+
+  // Already DOWN; don't refresh the fail window.
+  return false;
+}
+
+/** Increment the connection failure counter and conditionally mark the target 
DOWN.
+ *
+ * @param now         Current time, used as the failure timestamp if the 
target is marked DOWN.
+ * @param max_retries Number of failures that triggers a transition to DOWN.
+ * @param fail_window The fail window duration 
(proxy.config.http.down_server.cache_time).
+ * @return A pair { @c marked_down, @c fail_count } where @c marked_down is @c 
true if this call
+ *         caused the target to transition to DOWN (i.e. @c fail_count just 
reached @a max_retries
+ *         and the @c mark_down CAS succeeded), and @c fail_count is the 
updated counter value.
+ *
+ * @note @c marked_down can be @c false even when @c fail_count >= @a 
max_retries if another
+ *       thread concurrently marked the target DOWN first (the CAS on @c 
_last_failure will fail).
+ */
+std::pair<bool, uint8_t>
+HostDBInfo::increment_fail_count(ts_time now, uint8_t max_retries, ts_seconds 
fail_window)
+{
+  auto fcount      = ++_fail_count;
+  bool marked_down = false;
+
+  Dbg(dbg_ctl_hostdb_info, "fail_count=%d max_reties=%d", fcount, max_retries);

Review Comment:
   Typo in debug message: "max_reties" should be "max_retries" (and ideally 
keep debug output spelling consistent for searching/grepping).
   ```suggestion
     Dbg(dbg_ctl_hostdb_info, "fail_count=%d max_retries=%d", fcount, 
max_retries);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to