https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7115
RW <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #5 from RW <[email protected]> --- (In reply to Henrik Krohns from comment #2) > I actually run something similar, tokenizing attachment names > etc, but overall it make very little difference. I think it actually hurt in > some cases, but I don't remember the exact figures anymore.. It seems likely that a binary would be reused more commonly than a filename. IIRC the OCR plugin used to have a caching option which I presume was based on checksum. A single token might not have much effect on the Bayes result, but it might be very effective to use Bayes to keep track of attachment checksums and have a separate rule for scoring checksums only seen in spam. -- You are receiving this mail because: You are the assignee for the bug.
