On Fri, Dec 27, 2019 at 2:05 PM James Cook <jc...@cs.berkeley.edu> wrote:
> Some data about trying to pinpoint the end of the mailing list outage.
> It looks like it's slightly different per list; I suppose this may
> reflect the dates omd updated the configurations. Dates below are
> according to the archives at mailman.agoranomic.org.

Here's a not-quite-exact chronology reconstructed from logs:

- Unknown, but no later than Oct 29, when my logs start: Gmail first
starts returning 421 errors (temporary failure) with "authentication
information" message.  At least since Oct 29, all list messages were
delivered on later attempts.

- Dec 14 23:01 UTC: Gmail first starts returning 550 errors (permanent
failure) with same error message; first affected message is this one:
https://mailman.agoranomic.org/cgi-bin/mailman/private/agora-discussion/2019-December/056000.html

[During this period, Gmail rejected most deliveries, although it
accepted some.  The list could still receive from Gmail and deliver to
other servers.]

- Around Dec 22 23:21 UTC: Reconfigured qmail on vps.qoid.us (which
hosts the lists) to forward via ec2.qoid.us.

- Around Dec 23 00:39 UTC: Fixed ec2.qoid.us mail server to use IPv4
instead of IPv6.  Gmail etc. don't like mail coming from IPv6 because
you can't do effective IP bans.

[During this period, Gmail accepted... most deliveries, albeit delayed
due to rate limits, but it did reject a lot of daily digests, which
some people are subscribed to.  Moreover, icloud.com started rejecting
all deliveries; apparently ec2.qoid.us got onto the proofpoint.com
blacklist, a "machine-learning driven content classification system".
Sigh.]

- Around Dec 24 05:53 UTC: Reconfigured Mailman to send messages
through SMTP directly to ec2.qoid.us rather than going through the
local qmail.  This shouldn't affect anything.

- Around Dec 28 00:33 UTC: Turned on From munging and DKIM signing and
switched back to vps.qoid.us.  No mass rejections since then.

In all cases, the three lists were affected at the same time (except
turning on From munging, which happened a few seconds apart for each
list).

Unfortunately, since each subscriber gets their own separate delivery
attempt (mostly), there's no clear line between the list working and
not working.  The possibility of delayed delivery makes things even
more complicated, as does the interaction with daily digests.  I do
think it's a good idea to resolve this via ratification.

Sorry for the delay in explaining what's going on.  I'm with family
for the holidays, and I end up not spending any time on non-family
stuff, even though I have plenty of time.

The "authentication information" error message in question:

550-5.7.26 This message does not have authentication information or fails to
550-5.7.26 pass authentication checks. To best protect our users from spam, the
550-5.7.26 message has been blocked. Please visit
550-5.7.26  https://support.google.com/mail/answer/81126#authentication for more
550 5.7.26 information. t17si15910193pjr.44 - gsmtp

The message is misleading.  Without From munging, list messages do
often fail DKIM authentication checks, because of the DIS/BUS/OFF
prefix added to the subject.  But this failure has existed for years
without causing Gmail to reject messages, although it sometimes sent
them to Spam or marked them as suspicious.  Moreover, sending the same
messages from ec2.qoid.us worked... or at least didn't fail the same
way.  So it seems like Gmail decided to distrust vps.qoid.us's IP
address.  I can think of a few possible reasons why:

- Backscatter: I recently checked the IP address against various spam
blacklists, and while it wasn't on the most common ones, it was on the
backscatterer.org blacklist.  This surprised me.  Turns out that my
server was vulnerable to a straightforward backscatter attack, where
you send mail to an intentionally invalid recipient, setting the From
address to whoever you want to spam, and the resulting bounce message
is delivered unsolicited to them.  The version of qmail I'm using has
a mechanism to reject invalid recipients synchronously within the SMTP
connection, rather than sending a bounce message... but when I first
started running the lists, I had to disable this mechanism due to a
bug.  I forgot that I still hadn't fixed that.  Oops.

Since then, I've fixed the bug and re-enabled recipient verification.
As a bonus, I also wrote some code to synchronously reject messages to
the lists if the sender isn't subscribed to that list.  This
duplicates an existing check in Mailman, which has always been
enabled, but is asynchronous.  Originally it was set to reject
messages from non-subscribers with an explanatory bounce message, but
a long time ago I had to switch it to silently ignoring them, again
for fear of backscatter spam.  Having messages silently ignored is
confusing; now I can return a proper error without risking
backscatter.  (The error will probably be returned to the sender as a
bounce message, but coming from their own delivery agent rather than
my server; it doesn't have the same issue because it knows the From
isn't forged.)  Note that this doesn't currently work for other
Mailman rejections, such as for oversized messages.

Anyway, other possible reasons:

- Backscatter via Mailman: Some "bot" email addresses, like
agora-<listname>-subscr...@agoranomic.org, would send a response that
quotes your original message in full, creating the possibility of
another kind of backscatter.  Not sure if anyone was actually doing
this, but for now I've disabled these aliases; now the only way to
subscribe is by filling out the web form, and the only way to verify
the subscription is by clicking the link in the confirmation message
(as opposed to replying to it).  A bit suboptimal, especially since
the confirmation messages have a tendency to get rejected as
unsolicited mail, but meh.

- Approval requests: Technically, messages from non-subscribers were
not blackholed but held for moderation.  This resulted in approval
request messages which I had set to go to my Gmail account, so that in
theory I could spot legitimate messages – even if I usually wasn't
paying attention, because the messages were almost all just random
spam sent to the list email addresses.  Well, Gmail started marking
approval requests for spam as being spam itself, which was convenient
for me as it could help distinguish legitimate messages.  However, the
stream of "spam" from my server probably harmed its reputation.  This
was dumb; I should have changed it long ago.  In any case, now that
I'm synchronously rejecting messages from non-subscribers, the stream
of approval requests has stopped.

- Forwarding: Completely unrelated to Agora, but hosted on the same
server, I had some email aliases which forwarded all incoming mail to
Gmail accounts, which of course included spam.  I've switched this to
a different server.

Since I've addressed all these issues, as I've said, I'm hoping that
vps.qoid.us's reputation will improve and I'll be able to turn From
munging back off eventually.

Reply via email to