https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7490

            Bug ID: 7490
           Summary: Match for any UTF-8 character stopped working in 3.4.1
           Product: Spamassassin
           Version: 3.4.1
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: Undefined

Hello,

few weeks ago we noticed that some custom body rules are not matching anymore
on some of our company's mail servers. After some investigation we found out
that the cause is a regression in the new SpamAssassin version - rules stopped
working on servers with 3.4.1 while working correctly on servers with 3.4.0.

(I verified this by upgrading one of the 3.4.0 servers - Debian Jessie - to
3.4.1 from jessie-backports. Rules stopped working, which was fixed by
downgrading back to 3.4.0.)

Rules in question are something like this one:

body CHECK /volba v.s kontaktovat se vzbudila z geografick. povahy/i

This phrase reads "volba vás kontaktovat se vzbudila z geografické povahy" in
the original message, it is a poor - most likely automated - translation. (The
whole message is an obvious spam.) Considering the phrase is readable without
the UTF characters (vas, geograficke), we wanted to catch both variants with
the rule.

>From what we have been able to find out, SA 3.4.1 compares the strings
byte-wise, regardless of multibyte characters, causing the rule to not match.
The rule can be sort of fixed by changing the dot character into double dot
(both characters are 2-byte) but then it fails to detect the purely ASCII
variant of the phrase.

Is there any configuration option (or is it possible to add one) that would
revert this to previous behaviour, where single dot matched both ASCII and
multibyte characters?

Thanks in advance.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to