[Bug 7115] Adding SHA digests of MIME parts as Bayes tokens allows bayes to 'see' non-textual content

bugzilla-daemon Mon, 29 Dec 2014 07:08:14 -0800

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7115


--- Comment #9 from Mark Martinec <[email protected]> ---
(In reply to John Hardin from comment #8)
> I'd like to ask you to consider adding something like
> textattachvis/textattachinvis, which pulls words (visible/hidden) from text
> attachments (plain or HTML, detected by MIME type or filename extension).  
> One tactic spammers use is to attach a plain text or HTML file and the body
> of the message is "please see the attachment", and the attachment is obvious
> spam or something like a phishing form. SA doesn't scan that because it's
> not strictly "visible message body text".

Isn't this how it already works?

I checked tokenization of a test message which was a multipart/mixed,
where the first subtree was multipart/alternative with a text/plain and
text/html parts, followed by one text/plain and one text/html attachment.
Words from all four MIME parts ended up as Bayes tokens.

Please attach a sample message where you find that textual attachments
were not tokenized for Bayes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7115] Adding SHA digests of MIME parts as Bayes tokens allows bayes to 'see' non-textual content

Reply via email to