Murray S. Kucherawy wrote: >> I don't think there is anything reliable there from I can see, but its >> not unreasonable for one to hypothesize that there might be a direct >> correlation between the number of hops and the tendency to use >> relaxed/relaxed. It might be interesting to see if that may be a >> motivation for using relaxed/relaxed: >> >> c-param vs ave # of hops (received lines) > > +---------------------+-----------+------------+----------+ > | avg(received_count) | hdr_canon | body_canon | count(*) | > +---------------------+-----------+------------+----------+ > | 1.0976 | 0 | 0 | 2214 | > | 1.0000 | 0 | 1 | 7 | > | 1.0338 | 1 | 0 | 7569 | > | 2.3349 | 1 | 1 | 14086 | > +---------------------+-----------+------------+----------+ > > Canonicalizations of "0" mean "simple", "1" is "relaxed". So there > is possibly a correlation between use of relaxed/relaxed and the > hop count for spam,
I just finished doing this test and got the following. I stored records (hops, hash, sdid) in a SQL tables and ran the following queries: select hash, count(*) from c14n group by hash; +--------------------------------+ | hash count(*) | |--------------------------------| | relaxed/relaxed 5420 | | relaxed/simple 1115 | | simple/relaxed 2 | | simple/simple 1314 | +--------------------------------+ select hash, hops, sdid, count(*) from c14n group by hops order by hops desc, hash; +--------------------------------------------------------------+ | hash hops sdid count(*) | |--------------------------------------------------------------| | relaxed/relaxed 8 gmail.com 8 | | relaxed/relaxed 7 talamasca.ocis.net 6 | | relaxed/simple 6 mrochek.com 49 | | relaxed/relaxed 5 yahoo.com 474 | | relaxed/relaxed 4 gmail.com 184 | | simple/simple 3 maimonides.edu 84 | | relaxed/relaxed 2 coldwatercreek.com 1483 | | relaxed/relaxed 1 facebookmail.com 5563 | +--------------------------------------------------------------+ I had notice gmail.com messages had a wide degree of multi-hops, so I did a query just for it: select hash, hops, sdid, count(*) from c14n where sdid="gmail.com" group by hops order by hops desc, hash; +--------------------------------------------------------------+ | hash hops sdid count(*) | |--------------------------------------------------------------| | relaxed/relaxed 8 gmail.com 8 | | relaxed/relaxed 7 gmail.com 4 | | relaxed/relaxed 6 gmail.com 14 | | relaxed/relaxed 5 gmail.com 14 | | relaxed/relaxed 4 gmail.com 107 | | relaxed/relaxed 2 gmail.com 130 | +--------------------------------------------------------------+ Looking at these messages: hops=2 direct private emails to users hops=4 xml-dev list messages hops=5 pop3ext, ietf-smtp list messages hops=6 spf-help, ietf discuss list messages hops=7 spf-discuss list messages hops=8 spf-discuss list messages > but I have trouble envisioning that as > something that's being actively considered by signers. The reason we needed relaxed in the first place is because there are many long time systems that are still active and had evolved from UUCP (like us) and still have those backend internal I/O designs, including UI, report writers, text interfaces, etc, in place. The first change was just swapping the transport method UUCP to SMTP and the only interoperability requirement was to make sure the edge had the proper LF/CRLF interface translations in place. Never an issue until DKIM came along. So for example, if the system backend storage is <LF>, you can imagine a standalone DKIM signer or verify utility needs to take this I/O into account when reading the file. It can't assume that all mail storage is x822/5322 with CRLF delimiters. We can state it but it is really none of anyone's business how the backend data is stored as long the end result is the same. So are signers/operators aware mail mutations can happen? I think so. Are signers blasting 1 to Many messages "believe" they need a more relaxed integrity to maximize the DKIM verification across the many receivers? I think so (although your stats are showing the similar passage rates for simple or relaxed). I also think that if DKIM has a C14N option (i.e. STRIP) available to resolve legacy throughputs for particular streams, they will use it too maybe on per target basis only. :) Anyway, thanks. -- Hector Santos, CTO http://www.santronics.com http://santronics.blogspot.com _______________________________________________ NOTE WELL: This list operates according to http://mipassoc.org/dkim/ietf-list-rules.html