Hi Steve,

your observations put me on the right track.  Thank you so much!

Long post below:

On Thu 24/Sep/2020 12:31:29 +0200 Stephen J. Turnbull wrote:
Alessandro Vesely writes:

First, what Mailman are you talking about?  Only Mailman 3 is likely to get
these improvements, as Mailman 2 is end-of-life.  However, Mailman 2
installations are likely to be around in large numbers for several years,
and if Mailman 2 is any evidence, likely few Mailman 3 installations would
use these features unless forced to by a disaster like the Yahoo/AOL sudden
switch to DMARC p=reject.

Reversing transformation should work with any Mailman version, and with other mailing list managers as well. Hence, one cannot rely on some precise indications, but rather on common MLM behavior.


Yet, it is possible to undo the transformation that Mailman put in place,
thereby validating the original DKIM signature. >
It would always be possible to undo all transformations by supplying the
original email as part of a multipart/alternative, or perhaps a new
multipart subtype, maybe with some kind of device to make reading the
message/rfc822 original difficult in standard MUAs.

Comparing to the original can make it more difficult to check that the transformed version does not deviate from the original in unacceptable ways. While limiting the cases where original signatures can be recovered, a set of accepted transformations also limits the attack surface.


(In the case of Microsoft MUAs, if Mailman is configured to strip HTML, the
result might be less than 10% bigger than the original! ;-)

Even if plain text is safer, stripping HTML is irreversible.


It is sometimes possible to reverse transformations with only the information in the post after Mailman processing. However, some very desirable changes are destructive (eg, anonymized lists, conversion of HTML
to plain text, removal of prohibitive attachments).  Some non-destructive
changes (headers and footers) are highly customizable. So the question is
what are the transformations that users want to reverse, and whether that's
really possible.

It has to be the responsibility of the list owner to configure Mailman so that the transformations can be reversed. Some options, like anonymized lists, are clearly at odds with the need to recover the original author domain's signature.


This kind of transformation reversal probably requires no changes to Mailman, just an addition of a Handler which could be written independently
and "dropped in" (with a configuration change to the default pipeline).

On the other hand, a new header field can be abused. List-*, for example, are often found in a number of messages which don't come from mailing lists, but just aim at not being classified as UCE.


The necessary information about transformations that are configured would be
available from Mailman in the usual way (existing Handlers need that
information).

There could be more info than just the confguration, see the heuristic below.


Mailman carries out some irreversible changes, such as rewriting
To: or Cc: changing the order of the mailboxes,

Does this happen outside of DMARC mitigation?  Can you show examples?


I checked a few messages and couldn't find a switched To:. Switched Cc: seems to happen when one of the recipients is the list itself, which is then moved to the last place. (I try to reproduce that behavior with this message.)


or rewriting Content-Transfer-Encoding: irrespective of quotation marks
and case (for example "7bit" even if the original, signed field was
spelled as "7Bit"). >
I'm not aware of such behavior *unless other modifications were done*. In
that case, Mailman is specifying the C-T-E it uses, it is not rewriting the
original C-T-E.

I don't know if it's Mailman or a DKIM signing tool running afterwards, but many plain text messages from mailing lists come rendered as base64. Since the footer is part of the only MIME entity, the reversal has to decode base 64, remove the footer, and re-encode as base64 if that was the original C-T-E. In the latter case, one needs to know the column width of the original encoding...


I guess this behavior is coded deeply in Python libraries,

I don't think so.  As far as I know, the email module in Python 3 provides
some support for parsing header fields but I don't know why this would
change order or spelling of field contents.

Aha, you're right. It is probably Mailman writing its own C-T-E, which happens to be the same as the original, albeit spelt differently.


I would guess that to the extent that it happens it has to do with Mailman-level processing (for example, collecting addresses from the same domain so they can be presented as multiple RCPT TO with a single DATA).

No, the MTA doesn't care about header addresses. Domain optimization has to be done after MX lookup.


I can say for sure that some care was taken to ensure that the order of header fields, including multiple instances of the same field is carefully
preserved.

Good news. In any case, DKIM header canonicalization has to be "relaxed", because fields are re-wrapped.


but would like to know developers' opinions.  Is that something that could
be fixed? >
First, the issues with headers could be improved, though not entirely fixed,
in DKIM itself by further canonicalizing structured headers before signing
or verifying. >
I'm not saying that this is the right way forward, but it should be considered.

There have been various proposals about a MIME-compatible DKIM. It's not going to happen any time soon. There's not enough traction.


The second question is about producing a hint to the verifier telling
which transformation(s) have been applied to the message.  That would come
as an additional header field, for example: >>
     DKIM-Transform: footer

This could be done easily, but it would be at best a hint.  Among other
things, it might be desirable to identify the agent that performs the
transformation, as well as the algorithm and perhaps the host and/or the
list.  Mailman adds footers in different ways, specifically appending text
and adding a MIME part.  Third party patches are available that dig into
HTML structure (at least for Mailman 2).

DKIM-Transform would ease the reversing filter's job greatly. I wrote my prototype relaying on it, then I wrote a bash script to add the hint. The script was easy only because it /knew/ that a footer was added one way or the other.


There are lists that feed into lists, and apply their own transformations.

IMHO, transformation reversing must not be stackable. That is, no attempt to recover a middle mediator's signature. If multiple footers or multiple subject tags are added, reversibility is lost.


or as an extra tag in a DKIM signature, for example:

     DKIM-Signature: v=1; (...) tf=footer; (...)

Not possible without a lot of effort and specific cooperation from MTAs.
Mailman doesn't DKIM sign messages, really doesn't want to (there are Python
modules for this, but use and configuration would be our responsibility so
we'd like to have specialists do it), and probably shouldn't (we're not
specialists) -- that should be left to the border MTA of the administrative
domain.

Thank you for confirming that.


That hint could spare the verifier one pass over the message.  Is it
something  that could be implemented?  If not, I'd try guessing, according
to this scheme: >
You're going to have to guess a lot for a long time anyway, because very few
installations will implement this header.  It's not obvious to me that
guessing won't be nearly as accurate as the header might be.

That's a key observation!


outermost Content-Type: |  first entity Content-Type: |  transformation |
------------------------+-----------------------------+-----------------+
text/plain              |   any                       |  footer         |
------------------------+-----------------------------+-----------------+
multipart/mixed         |   multipart/mixed           |  add-part       |
------------------------+-----------------------------+-----------------+
multipart/mixed         +   any other                 |  mime-wrap      |
------------------------+-----------------------------+-----------------+
any other               |   any                       |  non-reversible |
------------------------+-----------------------------+-----------------+

Does that look correct?

Not 100%.  I'm not sure what you mean by "mime-wrap", but if it's
Mailman's "Wrap Message" DMARC mitigation, as far as I know nobody
uses it.


No, those are in the initial set of transformations in the draft i cited.

*mime-wrap* is when the original message was, say, multipart/alternative, as in an HTML message with plain text equivalent. In that case, Mailman creates a new overall part with two entities. The first is the original body of the message, the second the added footer.

*add-part* is when the original message was already multipart/mixed, as in the case of an attachment. Mailman keeps the existing structure and adds a part at the bottom, with the footer.

The mime-wrap case is so easy to reverse that it took me a good deal of time to realize that I cannot take advantage of it.

So, the kind of transformation says a bit more than the list configuration. How a footer is added depends on the message at hand.


I suspect that pretty much any multipart/mixed may have an added part
containing a footer, but it might not.

Here's the *heuristic* I came up with (not yet implemented):

First of all, find out a purported author domain, in any of From:, Original-From:, or Reply-To:. Check if that domain differs from the one in the Sender:. Then check if there is a failed DKIM signature by the purported author domain. In that case, try and reverse the transformation.

For the header, if the From: was rewritten, put it back. If the Subject: starts with a bracket, remove it. If any Original-* (or whatever) are present, replace them.

For the body, look for a line consisting of underscores or dashes. Check it is in a text/plain MIME entity, either its own entity or the whole body. Check that line is not followed by more than, say, 10 lines of text at the end of the message. If found, remove it as per the table above.


I think that would work for most of the mailing lists I'm subscribed to, for "mild" DKIM signers. "Hard" DKIM signers, e.g. those who sign Sender:, will have to adjust Original-* fields by trial and error.

At this point, I'm not so sure that specifying a DKIM-Transform: header field to ease transformation reversal is a good idea. Obviously, it would apply to future transformation only. In the future we'll have more powerful machines, so if the only advantage is to gain some efficiency, it becomes questionable.


Currently, there are mailing lists which don't do any change, not even subject tags, in order to avoid breaking DKIM signatures. A somewhat Procrustean solution. >
It's the ONLY guaranteed solution, though, because avoiding rewriting is
only possible if you *know* that you're distributing modified posts only to
sites participating in your reversible modifications protocol (or ignore
DMARC p=reject).

Exactly.


I don't think From: rewriting is going to be disabled any time soon. >
You're right.  You need universal deployment of reverse transformation to
make disabling rewriting palatable. >
Reply-To: usually comes after From:, thereby requiring to go back to
change already parsed fields. >
That's not a problem, since DKIM requires reordering fields anyway. The
expensive part is not fiddling with the header, it's multiple passes of the
signature algorithm.

Good point!

Knowing the kind of transformation beforehand can save one pass through the message.


As an alternative, I'd provide for yet another field to be put near the
top of the header. >
It's not an alternative.  The changes to Reply-To or Cc are *necessary* (in
the opinion of the list admin, not Mailman) to preserve the ability of the
recipient MTA to respond to author.

The goal is different. Transformation reversal recognizes the original signature, thereby affecting the aggregate reports that senders receive. According to DMARK marketing, the latter might influence their decision to switch to a strict DMARC policy.

As a side effect, when the MDA sees that an Original-From: was authenticated, it can restore it in place of From:, after any external forwarding but before storing the message. That way, users recover the ability to reply privately to the author.


Original-From:, say. This may seem redundant, however it serves a different goal. In addition, if the Original-From: is put in place by the
original signer, it ratifies its knowledge that From: will be rewritten
and its willingness to recover it afterwards. >
Could work, but addition of Original-From should be done by DMARC originators, not by Mailman.

Yes.


The name should probably be DMARC-Original-From, as well.


Yeah, or DKIM-Original-From, or whatever. It's important that it be new. I sometimes see X-Original-From:, which the new field shouldn't be conflated with. (Or should it?)

Note that, as mentioned above, if the author domain encodes a plain text message in base64, it should also add something like:

    Original-Content-Transfer-Encodig: base64; column-width=76


Is this endeavor completely useless, given that the current settings work
well enough?  Or could it help keeping a consistent DMARC semantics among
participants yearning to do so?  I'd be glad to hear your opinions... >
I don't think it's useless, but I don't see any reason for Mailman to participate until there's a (1) specification of transformations that people want to be reversible, or (2) specific defects that if fixed, or (3) features that if added, would enable reversibility. > For (1), we would just guarantee a particular recognizable format for transformations that should be reversible, and (2) and (3) would be addressed as usual. > As mentioned, the hinting function can be done well-enough by a user-supplied Handler that looks up the list's configuration, determines the
transformations that are applied, and inserting the hints in the appropriate
place.

The draft I cited provided for a IANA registry containing the set of reversible transformation accepted. I'll try and propose an alternative specification, based on the above heuristic (when I'll have implemented it). The question of a DKIM-Transform: header field and a registry of its possible content should probably be discussed at that time.

One case I can think of is a list with an HTML footer. It may use an <hr> instead of many underscores. How would that heuristically be discovered? Adding an entry in a registry would be a way to add new possibilities. On the other hand, HTML presents a wider attack surface. Section 8.2 of DKIM still deprecates using l=, as it can be used for "exploiting lax HTML parsing in the MUA" in order for "the appended content to completely replace the original content in the end recipient's eyes".


Finally, contrary to what we all wish were true, this is not really a choice
for mailing lists.  It's a choice for recipient ad-dom border MTAs.  If they
don't buy in in large numbers, I'm not particularly interested in doing the
work.  I don't see why they would.


IME, MTAs are much more interested in DKIM signing than verifying. Anyway, sooner or later I'll release my filter with reverting capabilities. Albeit it is not widely used, a site can at least recognize their own signatures when messages come back from a list.


Most lists I participate in do things like strip large attachments and strip
prohibited executables.  I think those are very common in general. Most
lists I participate in also strip text/html alternatives and many convert
text/html to text/plain.  If that's the common case, why would a postmaster
bother (unless they're a DMARC purist, of course, which may be a good thing
but I don't think there are very many of them)?

That is the list's choice.


If the postmasters aren't bothering, why would list admins?  If the list
admins aren't bothering, why should Mailman?

Perhaps it's me, but I feel a bit of reluctance in From: rewriting. It /has/ to be done, so savvy list admins do it. However, nobody seems to like it. If that feeling is correct, a little bit of adoption should show up...


On the other hand, I do think that Mailman can and should enable
better analytics on posts by ensuring that we only change parts of a
message that we intentionally change.  I guess the might include
situations like the case where Mailman changes a MIME part,
reassembles the whole message, and assigns a C-T-E that happens to
have the same semantics as the original C-T-E.


Those changes are benign.


I encourage you (or anyone in the reversible transformation effort) to
report inadvertant changes as bugs, and to suggest candidates for
"standard formats" we could adopt to make ad hoc reversals more
reliable (eg, the list name in a Subject tag should be enclosed in
square brackets and match the last component of the List-ID -- not
clear how that works with internationalized lists though).


We need to experiment. For example, author domains may want to try adding an Original-Subject:.


I can't speak for other developers at this point, so I can't promise
any proposals would be implemented, but I'm certainly interested and
in some cases would definitely be an "advocate on the inside".


Thank you for your commitment.


Best
Ale
--























_______________________________________________
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9

Reply via email to