Thanks again for all the advice.

No problems today so far. I suspect this was some type of network issue,
which affected only part of the Google network. As noted, we had zero
failures for all domains but this one, and we had no failures sending to
paid Google Workspace domains.

Emanuel:
> This was caused by a temporary issue reaching the DNS servers. Let me
pass this along to see if it can be turned into a temporary (4xy) error.

Thanks for letting me know. If you happen to know, which name server was
failing?

Benny:
>dont include anything you dont control, unless you own a google server at
all

That's really kind of funny. We were having a certain amount of
deliverability problems running our own mail servers for a few domains so
we gave up and switched to Google Workspace. (You can see my comments to
this list in the history somewhere.) Now when we have problems, I say
complain to Google. I have to point our MXes, SPFs, DKIMs at Google. What I
won't do is use them as a DNS provider, but maybe I'll have to if these
"temporary DNS failures" crop up more frequently.

Opti:
> I agree on a longer TTL in general if you’re not doing maint but a short
TTL shouldn’t cause failures by itself… unless you’re maxing a limit on
lookups or something?
> Looks like it’s on cloudflare who claims not to cap/cut off lookups but
maybe you have some reporting on that end you could check out/confirm
lookup errors idk. You’d think your DNS monitoring would catch it though.

There are no lookup limits. It's not on Cloudflare. We run our own name
servers at two different providers (Flexential and Linode/Akamai).

Kai:
> A TTL of just 300 seconds is way too short IMHO. If anything happens to
> your DNS you just have five minutes to fix the problem. Set the TTL to
> at least 3600 seconds.

Benny:
> more or less static data should be ttl of minimal 12h or 43200 seconds

Google doesn't like this advice either:

$ dig gmail.com
gmail.com. 300 IN A 142.251.15.19
gmail.com. 300 IN A 142.251.15.17
gmail.com. 300 IN A 142.251.15.18
gmail.com. 300 IN A 142.251.15.83

TTL is only relevant with enough hits from the exact same caching server,
which is not the usual situation for 99% of the mail/http servers out
there. We only have a few dozen hits a day for this particular domain. Even
with a million mail messages a day from 100 different sources, this would
be at most 28,800 hits a day, or roughly a hit every 3 seconds. Most single
core servers can handle that DNS load (our servers are much larger).

Why so short? We have had two very bad experiences with a
network provider that required us to switch our servers at a moment's
notice. With a half day TTL, that's not possible. We have had zero problems
with the 5 minute TTL. In fact, it makes sure that our name servers are
used more frequently so that if there are problems, we hear about them more
quickly than with a half day TTL.

Rob
_______________________________________________
mailop mailing list
mailop@mailop.org
https://list.mailop.org/listinfo/mailop

Reply via email to