https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8268

            Bug ID: 8268
           Summary: trim whitespace from anchor text in uri_detail_list
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: k...@mxguardian.net
  Target Milestone: Undefined

It would be convenient if leading & trailing whitespace was removed from
anchor_text in uri_detail_list. For example, HTML such as:

<a href="#">
   Download File
</a>

will end up with anchor_text containing "\n   Download File\n". This leads to
unexpected results if you have a rule such as:

uri-detail RULENAME text =~ /^download file$/i

The workaround is to not use regex anchors, or explicitly allow whitespace in
the regex:

uri-detail RULENAME text =~ /^\s*download file/i

However, I think this is non-intuitive and has tripped me up several times. I
don't think there is any harm in removing the whitespace since the rules of
HTML whitespace dictate that the HTML above should parse identically to this
HTML:

<a href="#">Download File</a>

Please see the attached patch and provide feedback.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to