https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8310
Bug ID: 8310
Summary: Issue with Matching UTF-8 Anchor Text in URIDetail
plugin
Product: Spamassassin
Version: 4.0.2
Hardware: All
OS: Linux
Status: NEW
Severity: major
Priority: P2
Component: Plugins
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: Undefined
Created attachment 5997
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5997&action=edit
sample email
Bug in SpamAssassin's uri_detail plugin related to matching Unicode characters
using \x{00} notation within anchor text. While ASCII hex escapes (\x6f) and
non-hex escapes (\s) work, Unicode hex escapes (\x{E0}) fail within uri_detail
rules. However, these Unicode hex escapes work correctly in regular body
rules.
The issue is specific to the uri_detail context. The (?^aa: prefix in the
regex might be related, but removing it doesn't solve the problem. Even
pasting the raw Unicode character directly into the regex fails.
uri_detail UNICODE_LINK_TEXT text =~
/\\x{E0}\\x{B8}\\x{97}\\x{E0}\\x{B8}\\x{B1}\\x{E0}\\x{B8}\\x{99}\\x{E0}\\x{B8}\\x{97}\\x{E0}\\x{B8}\\x{B5}/
The anchor text is
\x{E0}\x{B8}\x{95}\x{E0}\x{B9}\x{88}\x{E0}\x{B8}\x{AD}\x{E0}\x{B8}\x{AD}\x{E0}\x{B8}\x{B2}\x{E0}\x{B8}\x{A2}\x{E0}\x{B8}\x{B8}\x{E0}\x{B8}\x{97}\x{E0}\x{B8}\x{B1}\x{E0}\x{B8}\x{99}\x{E0}\x{B8}\x{97}\x{E0}\x{B8}\x{B5}
--
You are receiving this mail because:
You are the assignee for the bug.