Sidney Markowitz writes:
> Sidney Markowitz wrote, On 2/5/07 7:22 AM:
> >  EXTRA_MPART_TYPE && __OE_MUA && !__FORGED_OE
> 
> I've come up with some information and some questions about this after
> looking at the results of a set of rules T_SIDNEY_* that I put into my
> sandbox.
> 
> Here is the situation: EXTRA_MPART_TYPE looks for a Content-Type header
> that contains both a content-type multipart/ specification and another
> "type=" content-type specification. At first glance that seems wrong and
> redundant and a good spam sign given it's good S/O ratio and rank.
> 
> However, it turns out that RFC 2387 specifies Content-Type
> multipart/related as having a type= field that describes the
> content-type of its root MIME section. The EXTRA_MPART_TYPE rule will
> fire on any RFC-compliant multipart/related message. It is the correct
> MIME type to use for a message that includes components referenced by
> other components. The common example would be an HTML message that
> includes images that are not external links.

Well, don't forget -- RFC-compliant != nonspam.  We're a spam-detection
tool, not RFC-compliance-detection, so sometimes an RFC-compliant feature
is still worth using as a rule.

Having said that, EXTRA_MPART_TYPE is a pretty scary rule, and the whole
area of ham FPs on mails with inline GIFs is, I suspect, pretty vast. :(
This is why we locked it's score to 1.0, after all. It'd be great to sort
this out.

> Please look at past discussion on this list and in bug 5224 about
> OE_MULTIPART_RELATED. That rule was proposed in that bug and turned out
> to have a good S/O ratio. However, it was pointed out that there are
> legitimate emails that trigger it and there are no signs that can be
> used to distinguish the multipart/related header of Outlook Express mail
> that is spam and that is ham. The end result of the discussion was that
> Justin agreed that the rule should not be promoted out of testing.

It looks like in that bug, the rule was added into testing -- was
it removed later, after that point?

> Which brings me to EXTRA_MPART_TYPE. That rule also matches something
> which is legitimate RFC-compliant recommended usage when you want to
> send HTML mail with embedded images. If it doesn't get quite as good S/O
> as OE_MULTIPART_RELATED it's perhaps because there is a bit more ham
> that does that without using OE or forged OE. That does mean that you
> would see a more accurate slightly lower S/O for OE_MULTIPART_RELATED by
> removing from the hits anything that also hit FORGED_OE.
> 
> So should we really be using the EXTRA_MPART_TYPE rule?
> 
> To get a more fine-grained idea about what is going on with it, see the
> T_SIDNEY* rules from my sandbox. The names show what they are testing,
> with "OE" meaning Outlook Express excluding forged OE, HTML matching
> messages with HTML, EMPT meaning messages that match EXTRA_MPART_TYPE,
> and an "N" prefix to any of those three being a "Not".
> 
> I also just added T_SIDNEY_EMPT_NMPREL, T_SIDNEY_OE_EMPT_NMPREL,
> T_SIDNEY_NOE_EMPT_NMPREL to see if there are any EXTRA_MPART_TYPE emails
> that are not actually RFC2387 multipart/related messages. That hasn't
> been run through mass test yet as I type this.

I'd be fine with deprecating EXTRA_MPART_TYPE and replacing it with a
better rule/rules, I think.  Go for it ;)

--j.

Reply via email to