zrhoffman commented on code in PR #7302:
URL: https://github.com/apache/trafficcontrol/pull/7302#discussion_r1083012036
##########
traffic_monitor/cache/stats_over_http.json_new:
##########
@@ -0,0 +1,539 @@
+{
Review Comment:
From Weasel:
```
Error Unknown!
traffic_monitor/cache/stats_over_http.json_new
```
Is `stats_over_http.json_new` needed?
##########
traffic_monitor/cache/astats.go:
##########
@@ -116,8 +117,23 @@ func astatsParse(cacheName string, rdr io.Reader, pollCTX
interface{}) (Statisti
astats.Ats["system.proc.loadavg"] = astats.System.ProcLoadavg
astats.Ats["system.proc.net.dev"] = astats.System.ProcNetDev
+ via := ctx.HTTPHeader.Get("Via")
Review Comment:
Please add the `Via` header to `rfc` package and include the RFC for it:
https://github.com/apache/trafficcontrol/blob/0321d207699d390fdfa6067616ef955d084628ff/lib/go-rfc/http.go#L31-L34
##########
traffic_monitor/cache/astats.go:
##########
@@ -116,8 +117,23 @@ func astatsParse(cacheName string, rdr io.Reader, pollCTX
interface{}) (Statisti
astats.Ats["system.proc.loadavg"] = astats.System.ProcLoadavg
astats.Ats["system.proc.net.dev"] = astats.System.ProcNetDev
+ via := ctx.HTTPHeader.Get("Via")
+ if via != "" {
+ result := regexp.MustCompile(`
([a-z0-9\-]*)\..*comcast.net`).FindStringSubmatch(via)
+ if len(result) > 0 {
+ astats.Ats["via"] = result[1]
Review Comment:
Should this be `result[0]`?
##########
traffic_monitor/datareq/crstate.go:
##########
@@ -21,8 +21,10 @@ package datareq
import (
"fmt"
+ "github.com/apache/trafficcontrol/traffic_monitor/todata"
Review Comment:
Nit: Any import starting with `"github.com/apache/trafficcontrol` should be
grouped with the other imports that start with
`"github.com/apache/trafficcontrol`.
##########
docs/source/overview/traffic_monitor.rst:
##########
@@ -56,3 +56,6 @@ Protocol Engagement
-------------------
Short polling intervals of both the :term:`cache servers` and Traffic Monitor
peers help to reduce customer impact of outages. It is not uncommon for a
:term:`cache server` to be marked unavailable by Traffic Monitor - in fact, it
is business as usual for many CDNs. Should a widely requested video asset cause
a single :term:`cache server` to get close to its interface capacity, the
Health Protocol will "kick in," and Traffic Monitor marks the :term:`cache
server` as unavailable. New clients want to see the same asset, and now
:ref:`tr-overview` will send these customers to another :term:`cache server` in
the same :term:`Cache Group`. The load is now shared between the two
:term:`cache servers`. As clients finish watching the asset on the overloaded
:term:`cache server`, it will drop below the threshold and gets marked
available again, and new clients will begin to be directed to it once more. It
is less common for a :term:`Delivery Service` to be marked unavailable by
Traffic Monito
r. The :term:`Delivery Service` thresholds are usually used for overflow
situations at extreme peaks to protect other :term:`Delivery Services` in the
CDN from being impacted.
+Handling Caches with same service ip
+-------------------
Review Comment:
As the GitHub Actions annotation says,
> ```
> Title underline too short.
> ```
##########
traffic_monitor/cache/stats_over_http.go:
##########
@@ -72,6 +73,14 @@
ctx := pollCTX.(*poller.HTTPPollCtx)
+ via := ctx.HTTPHeader.Get("Via")
+ if via != "" {
+ result := regexp.MustCompile(`
([a-z0-9\-]*)\..*comcast.net`).FindStringSubmatch(via)
Review Comment:
As CodeQL says, `.net` should be `\.net` (here and in the other 2 places)
##########
traffic_monitor/datareq/crstate.go:
##########
@@ -84,10 +88,55 @@ func filterDirectlyPolledCaches(crstates tc.CRStates)
tc.CRStates {
return filtered
}
-func srvTRStateSelf(localStates peer.CRStatesThreadsafe, directlyPolledOnly
bool) ([]byte, error) {
+func srvTRStateSelf(localStates peer.CRStatesThreadsafe, directlyPolledOnly
bool, toData todata.TODataThreadsafe) ([]byte, error) {
if !directlyPolledOnly {
- return tc.CRStatesMarshall(localStates.Get())
+ localStatesC := updateStatusAnycast(localStates, toData)
+ return tc.CRStatesMarshall(localStatesC)
}
- unfiltered := localStates.Get()
+ unfiltered := updateStatusAnycast(localStates, toData)
return tc.CRStatesMarshall(filterDirectlyPolledCaches(unfiltered))
}
+
+func updateStatusAnycast(localStates peer.CRStatesThreadsafe, toData
todata.TODataThreadsafe) tc.CRStates {
+ localStatesC := localStates.Get()
+ toDataC := toData.Get()
+
+ for cache, _ := range localStatesC.Caches {
+ if _, ok := toDataC.ServerPartners[cache]; ok {
+ // all server partners must be available if they are in
reported state
+ allAvailableV4 := true
+ allAvailableV6 := true
+ allIsAvailable := true
+ for partner, _ := range toDataC.ServerPartners[cache] {
+ if partnerState, ok :=
localStatesC.Caches[partner]; ok {
+ // a partner host is reported but is
marked down for too high traffic or load
+ // this host also needs to be marked
down to divert all traffic for their
+ // common anycast ip
+ if
strings.Contains(partnerState.Status, "REPORTED") &&
+
strings.Contains(partnerState.Status, "too high") {
+ if !partnerState.Ipv4Available {
+ allAvailableV4 = false
+ }
+ if !partnerState.Ipv6Available {
Review Comment:
Same here, this should check if `allAvailableV6` is still `true` and only
set it to `false` if it is.
##########
docs/source/overview/traffic_monitor.rst:
##########
@@ -56,3 +56,6 @@ Protocol Engagement
-------------------
Short polling intervals of both the :term:`cache servers` and Traffic Monitor
peers help to reduce customer impact of outages. It is not uncommon for a
:term:`cache server` to be marked unavailable by Traffic Monitor - in fact, it
is business as usual for many CDNs. Should a widely requested video asset cause
a single :term:`cache server` to get close to its interface capacity, the
Health Protocol will "kick in," and Traffic Monitor marks the :term:`cache
server` as unavailable. New clients want to see the same asset, and now
:ref:`tr-overview` will send these customers to another :term:`cache server` in
the same :term:`Cache Group`. The load is now shared between the two
:term:`cache servers`. As clients finish watching the asset on the overloaded
:term:`cache server`, it will drop below the threshold and gets marked
available again, and new clients will begin to be directed to it once more. It
is less common for a :term:`Delivery Service` to be marked unavailable by
Traffic Monito
r. The :term:`Delivery Service` thresholds are usually used for overflow
situations at extreme peaks to protect other :term:`Delivery Services` in the
CDN from being impacted.
+Handling Caches with same service ip
Review Comment:
Nit: `IP` should be capitalized
##########
traffic_monitor/datareq/crstate.go:
##########
@@ -84,10 +88,55 @@ func filterDirectlyPolledCaches(crstates tc.CRStates)
tc.CRStates {
return filtered
}
-func srvTRStateSelf(localStates peer.CRStatesThreadsafe, directlyPolledOnly
bool) ([]byte, error) {
+func srvTRStateSelf(localStates peer.CRStatesThreadsafe, directlyPolledOnly
bool, toData todata.TODataThreadsafe) ([]byte, error) {
if !directlyPolledOnly {
- return tc.CRStatesMarshall(localStates.Get())
+ localStatesC := updateStatusAnycast(localStates, toData)
+ return tc.CRStatesMarshall(localStatesC)
}
- unfiltered := localStates.Get()
+ unfiltered := updateStatusAnycast(localStates, toData)
return tc.CRStatesMarshall(filterDirectlyPolledCaches(unfiltered))
}
+
+func updateStatusAnycast(localStates peer.CRStatesThreadsafe, toData
todata.TODataThreadsafe) tc.CRStates {
+ localStatesC := localStates.Get()
+ toDataC := toData.Get()
+
+ for cache, _ := range localStatesC.Caches {
+ if _, ok := toDataC.ServerPartners[cache]; ok {
+ // all server partners must be available if they are in
reported state
+ allAvailableV4 := true
+ allAvailableV6 := true
+ allIsAvailable := true
+ for partner, _ := range toDataC.ServerPartners[cache] {
+ if partnerState, ok :=
localStatesC.Caches[partner]; ok {
+ // a partner host is reported but is
marked down for too high traffic or load
+ // this host also needs to be marked
down to divert all traffic for their
+ // common anycast ip
+ if
strings.Contains(partnerState.Status, "REPORTED") &&
+
strings.Contains(partnerState.Status, "too high") {
+ if !partnerState.Ipv4Available {
Review Comment:
This should check if `allAvailableV4` is still `true` and only set it to
`false` if it is.
##########
traffic_monitor/datareq/crstate.go:
##########
@@ -84,10 +88,55 @@ func filterDirectlyPolledCaches(crstates tc.CRStates)
tc.CRStates {
return filtered
}
-func srvTRStateSelf(localStates peer.CRStatesThreadsafe, directlyPolledOnly
bool) ([]byte, error) {
+func srvTRStateSelf(localStates peer.CRStatesThreadsafe, directlyPolledOnly
bool, toData todata.TODataThreadsafe) ([]byte, error) {
if !directlyPolledOnly {
- return tc.CRStatesMarshall(localStates.Get())
+ localStatesC := updateStatusAnycast(localStates, toData)
+ return tc.CRStatesMarshall(localStatesC)
}
- unfiltered := localStates.Get()
+ unfiltered := updateStatusAnycast(localStates, toData)
return tc.CRStatesMarshall(filterDirectlyPolledCaches(unfiltered))
}
+
+func updateStatusAnycast(localStates peer.CRStatesThreadsafe, toData
todata.TODataThreadsafe) tc.CRStates {
+ localStatesC := localStates.Get()
+ toDataC := toData.Get()
+
+ for cache, _ := range localStatesC.Caches {
+ if _, ok := toDataC.ServerPartners[cache]; ok {
+ // all server partners must be available if they are in
reported state
+ allAvailableV4 := true
+ allAvailableV6 := true
+ allIsAvailable := true
+ for partner, _ := range toDataC.ServerPartners[cache] {
+ if partnerState, ok :=
localStatesC.Caches[partner]; ok {
+ // a partner host is reported but is
marked down for too high traffic or load
+ // this host also needs to be marked
down to divert all traffic for their
+ // common anycast ip
+ if
strings.Contains(partnerState.Status, "REPORTED") &&
+
strings.Contains(partnerState.Status, "too high") {
+ if !partnerState.Ipv4Available {
+ allAvailableV4 = false
+ }
+ if !partnerState.Ipv6Available {
+ allAvailableV6 = false
+ }
+ if !partnerState.IsAvailable {
+ allIsAvailable = false
+ }
Review Comment:
We can break here if `partnerState.Ipv4Available`,
`partnerState.Ipv6Available`, and `partnerState.IsAvailable` are all `false`,
right?
##########
lib/go-tc/crstates.go:
##########
@@ -44,6 +44,7 @@ type IsAvailable struct {
DirectlyPolled bool `json:"-"`
Status string `json:"status"`
LastPoll time.Time `json:"lastPoll"`
+ LastPollv6 time.Time `json:"lastPollv6"`
Review Comment:
It's true that `v` is not capitalized in `IPv6`, but variables are
capitalized according to CamelCase, so `V` should be capitalized in both Go and
JSON fields.
##########
docs/source/overview/traffic_monitor.rst:
##########
@@ -56,3 +56,6 @@ Protocol Engagement
-------------------
Short polling intervals of both the :term:`cache servers` and Traffic Monitor
peers help to reduce customer impact of outages. It is not uncommon for a
:term:`cache server` to be marked unavailable by Traffic Monitor - in fact, it
is business as usual for many CDNs. Should a widely requested video asset cause
a single :term:`cache server` to get close to its interface capacity, the
Health Protocol will "kick in," and Traffic Monitor marks the :term:`cache
server` as unavailable. New clients want to see the same asset, and now
:ref:`tr-overview` will send these customers to another :term:`cache server` in
the same :term:`Cache Group`. The load is now shared between the two
:term:`cache servers`. As clients finish watching the asset on the overloaded
:term:`cache server`, it will drop below the threshold and gets marked
available again, and new clients will begin to be directed to it once more. It
is less common for a :term:`Delivery Service` to be marked unavailable by
Traffic Monito
r. The :term:`Delivery Service` thresholds are usually used for overflow
situations at extreme peaks to protect other :term:`Delivery Services` in the
CDN from being impacted.
+Handling Caches with same service ip
+-------------------
+Traffic monitor is able to direct traffic router to route traffic to groups of
:term:`cache servers` with the same service ip address that is routed with qual
cost multi path(ECMP) on the last hop, but not to individual :term:`cache
server` within that group. To monitor :term:`cache servers` that within that
group, http keep-alive header is omitted with the ``health.polling.keepalive``
and ``stat.polling.keepalive`` traffic monitor profile parameter so that the TM
source port changes for those connections and the TM can get to all the servers
within that group via ECMP. That it updates the responses using the via header.
In the event of :dfn:`Health Protocol` determining the overall health of one of
the :term:`cache servers` in a anycast group by evaluating network throughput
and load against values configured in :ref:`to-overview` to exceed their limit,
it declares the whole groups as unhealthy.
Review Comment:
Nits: `IP` and `HTTP` should be capitalized.
##########
traffic_monitor/datareq/crstate.go:
##########
@@ -84,10 +88,55 @@ func filterDirectlyPolledCaches(crstates tc.CRStates)
tc.CRStates {
return filtered
}
-func srvTRStateSelf(localStates peer.CRStatesThreadsafe, directlyPolledOnly
bool) ([]byte, error) {
+func srvTRStateSelf(localStates peer.CRStatesThreadsafe, directlyPolledOnly
bool, toData todata.TODataThreadsafe) ([]byte, error) {
if !directlyPolledOnly {
- return tc.CRStatesMarshall(localStates.Get())
+ localStatesC := updateStatusAnycast(localStates, toData)
+ return tc.CRStatesMarshall(localStatesC)
}
- unfiltered := localStates.Get()
+ unfiltered := updateStatusAnycast(localStates, toData)
return tc.CRStatesMarshall(filterDirectlyPolledCaches(unfiltered))
}
+
+func updateStatusAnycast(localStates peer.CRStatesThreadsafe, toData
todata.TODataThreadsafe) tc.CRStates {
+ localStatesC := localStates.Get()
+ toDataC := toData.Get()
+
+ for cache, _ := range localStatesC.Caches {
+ if _, ok := toDataC.ServerPartners[cache]; ok {
+ // all server partners must be available if they are in
reported state
+ allAvailableV4 := true
+ allAvailableV6 := true
+ allIsAvailable := true
+ for partner, _ := range toDataC.ServerPartners[cache] {
+ if partnerState, ok :=
localStatesC.Caches[partner]; ok {
+ // a partner host is reported but is
marked down for too high traffic or load
+ // this host also needs to be marked
down to divert all traffic for their
+ // common anycast ip
+ if
strings.Contains(partnerState.Status, "REPORTED") &&
+
strings.Contains(partnerState.Status, "too high") {
+ if !partnerState.Ipv4Available {
+ allAvailableV4 = false
+ }
+ if !partnerState.Ipv6Available {
+ allAvailableV6 = false
+ }
+ if !partnerState.IsAvailable {
Review Comment:
Like the others, this should check if `allIsAvailable` is still `true` and
only set it to `false` if it is.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]