https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7022

Kent Oyer <k...@mxguardian.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |k...@mxguardian.net

--- Comment #20 from Kent Oyer <k...@mxguardian.net> ---
Sorry for digging up an old thread but hopefully this helps someone. I have
created an ASCII plugin that should alleviate this problem. The plugin is
available here:

https://github.com/mxguardian/Mail-SpamAssassin-Plugin-ASCII

There are no external dependencies and it is very fast due to pre-compiling the
rules. Existing rules continue to work as before. The plugin just adds a new
rule type 'ascii' that matches against body text that has been converted to
ASCII.

I've tested it on a small corpus and found a 4% reduction in FN's with no
change in FP's. A number that I think will increase as more rules are converted
to ASCII rules. 

The problem with using something like Text::Unidecode is that it transliterates
based on the meaning of the characters rather than appearance. Therefore I had
to create my own character map. 

Feedback welcome.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to