https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8310

            Bug ID: 8310
           Summary: Issue with Matching UTF-8 Anchor Text in URIDetail
                    plugin
           Product: Spamassassin
           Version: 4.0.2
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: Plugins
          Assignee: dev@spamassassin.apache.org
          Reporter: thana...@gmail.com
  Target Milestone: Undefined

Created attachment 5997
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5997&action=edit
sample email

Bug in SpamAssassin's uri_detail plugin related to matching Unicode characters
using \x{00} notation within anchor text.  While ASCII hex escapes (\x6f) and
non-hex escapes (\s) work, Unicode hex escapes (\x{E0}) fail within uri_detail
rules.  However, these Unicode hex escapes work correctly in regular body
rules.

The issue is specific to the uri_detail context.  The (?^aa: prefix in the
regex might be related, but removing it doesn't solve the problem.  Even
pasting the raw Unicode character directly into the regex fails.


uri_detail UNICODE_LINK_TEXT text =~
/\\x{E0}\\x{B8}\\x{97}\\x{E0}\\x{B8}\\x{B1}\\x{E0}\\x{B8}\\x{99}\\x{E0}\\x{B8}\\x{97}\\x{E0}\\x{B8}\\x{B5}/
The anchor text  is
\x{E0}\x{B8}\x{95}\x{E0}\x{B9}\x{88}\x{E0}\x{B8}\x{AD}\x{E0}\x{B8}\x{AD}\x{E0}\x{B8}\x{B2}\x{E0}\x{B8}\x{A2}\x{E0}\x{B8}\x{B8}\x{E0}\x{B8}\x{97}\x{E0}\x{B8}\x{B1}\x{E0}\x{B8}\x{99}\x{E0}\x{B8}\x{97}\x{E0}\x{B8}\x{B5}

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to