Sidney Markowitz wrote, On 2/5/07 7:22 AM:
>  EXTRA_MPART_TYPE && __OE_MUA && !__FORGED_OE

I've come up with some information and some questions about this after
looking at the results of a set of rules T_SIDNEY_* that I put into my
sandbox.

Here is the situation: EXTRA_MPART_TYPE looks for a Content-Type header
that contains both a content-type multipart/ specification and another
"type=" content-type specification. At first glance that seems wrong and
redundant and a good spam sign given it's good S/O ratio and rank.

However, it turns out that RFC 2387 specifies Content-Type
multipart/related as having a type= field that describes the
content-type of its root MIME section. The EXTRA_MPART_TYPE rule will
fire on any RFC-compliant multipart/related message. It is the correct
MIME type to use for a message that includes components referenced by
other components. The common example would be an HTML message that
includes images that are not external links.

Please look at past discussion on this list and in bug 5224 about
OE_MULTIPART_RELATED. That rule was proposed in that bug and turned out
to have a good S/O ratio. However, it was pointed out that there are
legitimate emails that trigger it and there are no signs that can be
used to distinguish the multipart/related header of Outlook Express mail
that is spam and that is ham. The end result of the discussion was that
Justin agreed that the rule should not be promoted out of testing.

Which brings me to EXTRA_MPART_TYPE. That rule also matches something
which is legitimate RFC-compliant recommended usage when you want to
send HTML mail with embedded images. If it doesn't get quite as good S/O
as OE_MULTIPART_RELATED it's perhaps because there is a bit more ham
that does that without using OE or forged OE. That does mean that you
would see a more accurate slightly lower S/O for OE_MULTIPART_RELATED by
removing from the hits anything that also hit FORGED_OE.

So should we really be using the EXTRA_MPART_TYPE rule?

To get a more fine-grained idea about what is going on with it, see the
T_SIDNEY* rules from my sandbox. The names show what they are testing,
with "OE" meaning Outlook Express excluding forged OE, HTML matching
messages with HTML, EMPT meaning messages that match EXTRA_MPART_TYPE,
and an "N" prefix to any of those three being a "Not".

I also just added T_SIDNEY_EMPT_NMPREL, T_SIDNEY_OE_EMPT_NMPREL,
T_SIDNEY_NOE_EMPT_NMPREL to see if there are any EXTRA_MPART_TYPE emails
that are not actually RFC2387 multipart/related messages. That hasn't
been run through mass test yet as I type this.

 -- sidney

Reply via email to