On Fri, Dec 27, 2019 at 2:05 PM James Cook <jc...@cs.berkeley.edu> wrote: > Some data about trying to pinpoint the end of the mailing list outage. > It looks like it's slightly different per list; I suppose this may > reflect the dates omd updated the configurations. Dates below are > according to the archives at mailman.agoranomic.org.
Here's a not-quite-exact chronology reconstructed from logs: - Unknown, but no later than Oct 29, when my logs start: Gmail first starts returning 421 errors (temporary failure) with "authentication information" message. At least since Oct 29, all list messages were delivered on later attempts. - Dec 14 23:01 UTC: Gmail first starts returning 550 errors (permanent failure) with same error message; first affected message is this one: https://mailman.agoranomic.org/cgi-bin/mailman/private/agora-discussion/2019-December/056000.html [During this period, Gmail rejected most deliveries, although it accepted some. The list could still receive from Gmail and deliver to other servers.] - Around Dec 22 23:21 UTC: Reconfigured qmail on vps.qoid.us (which hosts the lists) to forward via ec2.qoid.us. - Around Dec 23 00:39 UTC: Fixed ec2.qoid.us mail server to use IPv4 instead of IPv6. Gmail etc. don't like mail coming from IPv6 because you can't do effective IP bans. [During this period, Gmail accepted... most deliveries, albeit delayed due to rate limits, but it did reject a lot of daily digests, which some people are subscribed to. Moreover, icloud.com started rejecting all deliveries; apparently ec2.qoid.us got onto the proofpoint.com blacklist, a "machine-learning driven content classification system". Sigh.] - Around Dec 24 05:53 UTC: Reconfigured Mailman to send messages through SMTP directly to ec2.qoid.us rather than going through the local qmail. This shouldn't affect anything. - Around Dec 28 00:33 UTC: Turned on From munging and DKIM signing and switched back to vps.qoid.us. No mass rejections since then. In all cases, the three lists were affected at the same time (except turning on From munging, which happened a few seconds apart for each list). Unfortunately, since each subscriber gets their own separate delivery attempt (mostly), there's no clear line between the list working and not working. The possibility of delayed delivery makes things even more complicated, as does the interaction with daily digests. I do think it's a good idea to resolve this via ratification. Sorry for the delay in explaining what's going on. I'm with family for the holidays, and I end up not spending any time on non-family stuff, even though I have plenty of time. The "authentication information" error message in question: 550-5.7.26 This message does not have authentication information or fails to 550-5.7.26 pass authentication checks. To best protect our users from spam, the 550-5.7.26 message has been blocked. Please visit 550-5.7.26 https://support.google.com/mail/answer/81126#authentication for more 550 5.7.26 information. t17si15910193pjr.44 - gsmtp The message is misleading. Without From munging, list messages do often fail DKIM authentication checks, because of the DIS/BUS/OFF prefix added to the subject. But this failure has existed for years without causing Gmail to reject messages, although it sometimes sent them to Spam or marked them as suspicious. Moreover, sending the same messages from ec2.qoid.us worked... or at least didn't fail the same way. So it seems like Gmail decided to distrust vps.qoid.us's IP address. I can think of a few possible reasons why: - Backscatter: I recently checked the IP address against various spam blacklists, and while it wasn't on the most common ones, it was on the backscatterer.org blacklist. This surprised me. Turns out that my server was vulnerable to a straightforward backscatter attack, where you send mail to an intentionally invalid recipient, setting the From address to whoever you want to spam, and the resulting bounce message is delivered unsolicited to them. The version of qmail I'm using has a mechanism to reject invalid recipients synchronously within the SMTP connection, rather than sending a bounce message... but when I first started running the lists, I had to disable this mechanism due to a bug. I forgot that I still hadn't fixed that. Oops. Since then, I've fixed the bug and re-enabled recipient verification. As a bonus, I also wrote some code to synchronously reject messages to the lists if the sender isn't subscribed to that list. This duplicates an existing check in Mailman, which has always been enabled, but is asynchronous. Originally it was set to reject messages from non-subscribers with an explanatory bounce message, but a long time ago I had to switch it to silently ignoring them, again for fear of backscatter spam. Having messages silently ignored is confusing; now I can return a proper error without risking backscatter. (The error will probably be returned to the sender as a bounce message, but coming from their own delivery agent rather than my server; it doesn't have the same issue because it knows the From isn't forged.) Note that this doesn't currently work for other Mailman rejections, such as for oversized messages. Anyway, other possible reasons: - Backscatter via Mailman: Some "bot" email addresses, like agora-<listname>-subscr...@agoranomic.org, would send a response that quotes your original message in full, creating the possibility of another kind of backscatter. Not sure if anyone was actually doing this, but for now I've disabled these aliases; now the only way to subscribe is by filling out the web form, and the only way to verify the subscription is by clicking the link in the confirmation message (as opposed to replying to it). A bit suboptimal, especially since the confirmation messages have a tendency to get rejected as unsolicited mail, but meh. - Approval requests: Technically, messages from non-subscribers were not blackholed but held for moderation. This resulted in approval request messages which I had set to go to my Gmail account, so that in theory I could spot legitimate messages – even if I usually wasn't paying attention, because the messages were almost all just random spam sent to the list email addresses. Well, Gmail started marking approval requests for spam as being spam itself, which was convenient for me as it could help distinguish legitimate messages. However, the stream of "spam" from my server probably harmed its reputation. This was dumb; I should have changed it long ago. In any case, now that I'm synchronously rejecting messages from non-subscribers, the stream of approval requests has stopped. - Forwarding: Completely unrelated to Agora, but hosted on the same server, I had some email aliases which forwarded all incoming mail to Gmail accounts, which of course included spam. I've switched this to a different server. Since I've addressed all these issues, as I've said, I'm hoping that vps.qoid.us's reputation will improve and I'll be able to turn From munging back off eventually.