> -----Original Message----- > From: [email protected] [mailto:[email protected]] > On Behalf Of Hector Santos > Sent: Wednesday, May 18, 2011 1:49 PM > To: IETF-DKIM > Subject: Re: [ietf-dkim] New canonicalizations > > Whatever the actual reason, since its not the default and the reality > the option exist and serves a purpose, there is an reasonable > practical explanation there is a certain population of domains seeking > the path of least resistance with reduced accidental <cr><lf> > injections and mutations along the path as its very possible to occur > in our heterogeneous networks of Unix (LF), MAC (CR) or DOS (CRLF) > transport, gateways and storage I/O differences.
I think you're asking for a count of domains using various canonicalizations that produce spam. Here's what we have: +------------------------+-----------+------------+ | count(distinct domain) | hdr_canon | body_canon | +------------------------+-----------+------------+ | 214 | 0 | 0 | | 1 | 0 | 1 | | 62 | 1 | 0 | | 3805 | 1 | 1 | +------------------------+-----------+------------+ This counts a domain as "spammy" if the mail we've seen signed by that domain is labeled as spam by Spamassassin at least 50% of the time, just as a starting point. But if instead I report on less than 50% (relatively clean domains), the ratios are about the same: +------------------------+-----------+------------+ | count(distinct domain) | hdr_canon | body_canon | +------------------------+-----------+------------+ | 2703 | 0 | 0 | | 6 | 0 | 1 | | 2238 | 1 | 0 | | 20573 | 1 | 1 | +------------------------+-----------+------------+ So I don't think a conclusion's really possible here. > I don't think there is anything reliable there from I can see, but its > not unreasonable for one to hypothesize that there might be a direct > correlation between the number of hops and the tendency to use > relaxed/relaxed. It might be interesting to see if that may be a > motivation for using relaxed/relaxed: > > c-param vs ave # of hops (received lines) +---------------------+-----------+------------+----------+ | avg(received_count) | hdr_canon | body_canon | count(*) | +---------------------+-----------+------------+----------+ | 1.0976 | 0 | 0 | 2214 | | 1.0000 | 0 | 1 | 7 | | 1.0338 | 1 | 0 | 7569 | | 2.3349 | 1 | 1 | 14086 | +---------------------+-----------+------------+----------+ Canonicalizations of "0" mean "simple", "1" is "relaxed". So there is possibly a correlation between use of relaxed/relaxed and the hop count for spam, but I have trouble envisioning that as something that's being actively considered by signers. The same report for non-spam, however, shows that there's probably not much of a statistically significant difference: +---------------------+-----------+------------+----------+ | avg(received_count) | hdr_canon | body_canon | count(*) | +---------------------+-----------+------------+----------+ | 1.2570 | 0 | 0 | 220497 | | 1.0971 | 0 | 1 | 412 | | 1.4505 | 1 | 0 | 172136 | | 2.0206 | 1 | 1 | 980337 | +---------------------+-----------+------------+----------+ I don't know where all this is leading, but there you go. -MSK _______________________________________________ NOTE WELL: This list operates according to http://mipassoc.org/dkim/ietf-list-rules.html
