[Bug 7115] Adding SHA digests of MIME parts as Bayes tokens allows bayes to 'see' non-textual content

bugzilla-daemon Mon, 29 Dec 2014 07:21:47 -0800

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7115


--- Comment #10 from John Hardin <[email protected]> ---
(In reply to Mark Martinec from comment #9)
> (In reply to John Hardin from comment #8)
> > I'd like to ask you to consider adding something like
> > textattachvis/textattachinvis, which pulls words (visible/hidden) from text
> > attachments (plain or HTML, detected by MIME type or filename extension).  
> > One tactic spammers use is to attach a plain text or HTML file and the body
> > of the message is "please see the attachment", and the attachment is obvious
> > spam or something like a phishing form. SA doesn't scan that because it's
> > not strictly "visible message body text".
> 
> Isn't this how it already works?
> 
> I checked tokenization of a test message which was a multipart/mixed,
> where the first subtree was multipart/alternative with a text/plain and
> text/html parts, followed by one text/plain and one text/html attachment.
> Words from all four MIME parts ended up as Bayes tokens.

I did not actually test this before making the suggestion. I was making the
assumption, apparently erroneous, that the Bayes tokenization behavior
paralleled the behavior for BODY rules, where text attachments are not included
because they aren't part of the "visible message body".

Thanks for actually checking, and apologies for the noise.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7115] Adding SHA digests of MIME parts as Bayes tokens allows bayes to 'see' non-textual content

Reply via email to