https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7115

RW <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #5 from RW <[email protected]> ---
(In reply to Henrik Krohns from comment #2)
> I actually run something similar, tokenizing attachment names
> etc, but overall it make very little difference. I think it actually hurt in
> some cases, but I don't remember the exact figures anymore..

It seems likely that a binary would be reused more commonly than a filename.
IIRC the OCR plugin used to have a caching option which I presume was based on
checksum. 

A single token might not have much effect on the Bayes result, but it might be
very effective to use Bayes to keep track of attachment checksums and have a
separate rule for scoring checksums only seen in spam.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to