https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8268
Bug ID: 8268 Summary: trim whitespace from anchor text in uri_detail_list Product: Spamassassin Version: SVN Trunk (Latest Devel Version) Hardware: PC OS: Linux Status: NEW Severity: minor Priority: P2 Component: Libraries Assignee: dev@spamassassin.apache.org Reporter: k...@mxguardian.net Target Milestone: Undefined It would be convenient if leading & trailing whitespace was removed from anchor_text in uri_detail_list. For example, HTML such as: <a href="#"> Download File </a> will end up with anchor_text containing "\n Download File\n". This leads to unexpected results if you have a rule such as: uri-detail RULENAME text =~ /^download file$/i The workaround is to not use regex anchors, or explicitly allow whitespace in the regex: uri-detail RULENAME text =~ /^\s*download file/i However, I think this is non-intuitive and has tripped me up several times. I don't think there is any harm in removing the whitespace since the rules of HTML whitespace dictate that the HTML above should parse identically to this HTML: <a href="#">Download File</a> Please see the attached patch and provide feedback. -- You are receiving this mail because: You are the assignee for the bug.