On 2/10/22 19:42, John Todd wrote:
I think it would be fair to say that ICMP echo to easy-to-remember
internet resources is tolerated, but not encouraged, and is probably
not a good idea unless one knows and very well understands the
implications of failure (or success!) modes that don’t match the
conditions that are expected. Terrible monitoring is easy; good
monitoring is quite difficult.
It is reasonable to expect operators of systems designed for one type
of service to quickly rate-limit or entirely filter non-critical
alternate capabilities in the event of resource exhaustion or other
type of risk to the primary service - this applies to web severs, DNS
servers, NTP servers, etc. Also, choosing as an indicator a response
from a protocol such as ICMP echo/reply which has had a historical
risk of flooding attacks and which may have rapid clamping of traffic
seems to be also a large check mark in the “do not use” column. ICMP
echo stands real risks of not providing expected results for reasons
that are known only to the target operator, and which do not take your
non-obvious intentions into consideration.
More central to the issue: “The Prudent Mariner never relies solely on
any single aid to navigation” (hi, Ken!) is an applicable quote here.
Nothing is immune from interruption of service, especially as it
becomes more distant from your administrative control. I see all too
often people using ICMP to a nameserver, or a query to a nameserver,
or a socket request to port 80 of some well-known name as the only
method utilized for determining if a larger set of systems are
available. This is not typically a good idea. I shudder to think what
would happen if certain well-known domains were to be unavailable due
to one of a dozen different potential failure cases. There are far too
many poorly-written stacks that assume some singular conditions are
“impossible” unless as a result of local failure, and that always ends
in sadness and late nights spent writing root-cause analysis reports.
Further adding to this complexity is the benefit or detraction of
anycast for many of these larger public services. What is “up” and
what is “down”? What is the signal generated or inferred by presence
or absence of this monitoring sample? The question typically generates
lively debate within a network or monitoring team. I am pretty sure
that “But I could ping x.x.x.x” is not typically a statement that has
much weight when considering overall reachability. I do admit it is a
hint, but not the answer, for many network conditions, but probably
not by itself should any system consider that result canonical for
anything other than that exact result.
If one is going to use responses of exterior (not within the same
organizational control) services as an indicator of reachability, then
a broad spectrum of tests are probably the only way to have anything
approaching certainty or knowledge upon which action could be based,
and even that will always have a shadow of a doubt. In that mix, ICMP
echo/reply to public nameservers is probably not the best indicator to
add in a monitoring suite, though it may appear to be perfectly OK…
until it isn’t. DNS queries to DNS servers seems to be the most
reasonable thing to use as test material, rather than ICMP, if one
were building a rickety monitoring house out of the resources at-hand.
Additionally: The suggestions of building some new ICMP-responding
service may end up being counter to the goals of the people using
external tests, so careful what is wished for. Witness everyone
installing various “speed testing” servers in their own networks,
which may not truly provide accurate measurements of anything other
than local loop speeds, which now sort of defeats the purpose of the
speed test for anything other than the most local set of results.
Well, the issue is being able to test at the lowest level possible, and
with the lowest common denominator.
While I agree that testing an application (like DNS) makes sense, it is
not simple, and is a lot higher up in the layers, where a lot more
things need to work reasonably well for that test to be successful.
Ping, on the other hand, is so basic, that barring rate-limiting or
outright blocking, is a decent indicator of liveliness between source
and destination.
I'm all for pulling together as many tools as we can to detect
liveliness, for I don't see Ping going anywhere, anytime soon.
Heck, e-mail has been "dying" for decades, and yet it's still here.
Mark.