Sidney Markowitz wrote, On 2/5/07 7:22 AM: > EXTRA_MPART_TYPE && __OE_MUA && !__FORGED_OE
I've come up with some information and some questions about this after looking at the results of a set of rules T_SIDNEY_* that I put into my sandbox. Here is the situation: EXTRA_MPART_TYPE looks for a Content-Type header that contains both a content-type multipart/ specification and another "type=" content-type specification. At first glance that seems wrong and redundant and a good spam sign given it's good S/O ratio and rank. However, it turns out that RFC 2387 specifies Content-Type multipart/related as having a type= field that describes the content-type of its root MIME section. The EXTRA_MPART_TYPE rule will fire on any RFC-compliant multipart/related message. It is the correct MIME type to use for a message that includes components referenced by other components. The common example would be an HTML message that includes images that are not external links. Please look at past discussion on this list and in bug 5224 about OE_MULTIPART_RELATED. That rule was proposed in that bug and turned out to have a good S/O ratio. However, it was pointed out that there are legitimate emails that trigger it and there are no signs that can be used to distinguish the multipart/related header of Outlook Express mail that is spam and that is ham. The end result of the discussion was that Justin agreed that the rule should not be promoted out of testing. Which brings me to EXTRA_MPART_TYPE. That rule also matches something which is legitimate RFC-compliant recommended usage when you want to send HTML mail with embedded images. If it doesn't get quite as good S/O as OE_MULTIPART_RELATED it's perhaps because there is a bit more ham that does that without using OE or forged OE. That does mean that you would see a more accurate slightly lower S/O for OE_MULTIPART_RELATED by removing from the hits anything that also hit FORGED_OE. So should we really be using the EXTRA_MPART_TYPE rule? To get a more fine-grained idea about what is going on with it, see the T_SIDNEY* rules from my sandbox. The names show what they are testing, with "OE" meaning Outlook Express excluding forged OE, HTML matching messages with HTML, EMPT meaning messages that match EXTRA_MPART_TYPE, and an "N" prefix to any of those three being a "Not". I also just added T_SIDNEY_EMPT_NMPREL, T_SIDNEY_OE_EMPT_NMPREL, T_SIDNEY_NOE_EMPT_NMPREL to see if there are any EXTRA_MPART_TYPE emails that are not actually RFC2387 multipart/related messages. That hasn't been run through mass test yet as I type this. -- sidney
