On 2015-12-18 11:19, Paul Stead wrote:
After the messages last night I've been looking into the
redirector_pattern config option - I'm seeing weird results...

Given the redirector_pattern of:

redirector_pattern m'https?://www.google.com/url?q=([^&]+).*'i

I've noticed that spamassassin can sometimes miss - I don't think it's
to do with the regex above as I've tried multiple variations - I've made
it reproducible but I'm not sure why this is happening:

p1: http://pastebin.com/w93ZZp9h
p2: http://pastebin.com/p2pciGC3
[...]
# spaspamassassin -D -t < p2 2>&1 | grep baddomain

p2 doesn't pick up on baddomain.com

Any thoughts or have I stumbled upon a problem?


Two problems there, one is in your regexp, the other is in
the SpamAssassin logic of dealing with redirects.

The parameter of redirector_pattern is a regular expression,
dots and a question mark have a special meaning in a regexp.
If this special meaning is not wanted, they need to be quoted.
Also, SpamAssassin does not add anchoring to a redirector_pattern
regexp, so you need to add it yourself when needed.
And the .* at the end is redundant for this reason.

So it should be something like:

  redirector_pattern m{^https?://www\.google\.com/url\?q=([^&]+)}i


The other problem is in Mail::SpamAssassin::Util::uri_list_canonicalize()
near line 1346:

      # deal with http redirectors.  strip off one level of redirector
      # and add back to the array.  the foreach loop will go over those
      # and deal appropriately.
      # bug 3308: redirectors like yahoo only need one '/' ... <grrr>
      if ($rest =~ m{(https?:/{0,2}.+)$}i) {
        push(@uris, $1);
      }

# resort to redirector pattern matching if the generic https? check
      # doesn't result in a match -- bug 4176
      else {
        foreach (@{$redirector_patterns}) {

Note that it tries the hard-coded check first, and skips
evaluating redirector patterns when the hard-coded match
was successful.

So your redirector pattern was not tried at all, and at the same time
the hard-coded check obtained an invalid URL: from an URL
  "/url?q=http://baddomain.com&t=1";
it collected as "http://baddomain.com&t=1"; instead of "http://baddomain.com";
The URI syntax after a '?' should treat "&t=1" as a second argument
and not glue it with the first argument "http://baddomain.com";.

So the resulting invalid URL "http://baddomain.com&t=1";
does not match any URI rules for baddomain.com .


Please try the attached patch (to 3.4.1 or trunk), it reverses
the order of checks: tries redirector_patterns first, and only
falls back to a sloppy hard-coded check as a last resort.
(and that hard-coded check would better be fixed too)

... and please open a bug report in bugzilla.


Mark
Index: lib/Mail/SpamAssassin/Util.pm
===================================================================
--- lib/Mail/SpamAssassin/Util.pm	(revision 1720791)
+++ lib/Mail/SpamAssassin/Util.pm	(working copy)
@@ -1342,24 +1342,28 @@
       # deal with http redirectors.  strip off one level of redirector
       # and add back to the array.  the foreach loop will go over those
       # and deal appropriately.
-      # bug 3308: redirectors like yahoo only need one '/' ... <grrr>
-      if ($rest =~ m{(https?:/{0,2}.+)$}i) {
-        push(@uris, $1);
-      }
 
-      # resort to redirector pattern matching if the generic https? check
-      # doesn't result in a match -- bug 4176
-      else {
-	foreach (@{$redirector_patterns}) {
-	  if ("$proto$host$rest" =~ $_) {
-	    next unless defined $1;
-	    dbg("uri: parsed uri pattern: $_");
-	    dbg("uri: parsed uri found: $1 in redirector: $proto$host$rest");
-	    push (@uris, $1);
-	    last;
-	  }
-	}
+      # try redirector pattern matching first
+      # (but see also bug 4176)
+      my $found_redirector_match;
+      foreach my $re (@{$redirector_patterns}) {
+        if ("$proto$host$rest" =~ $re) {
+          next unless defined $1;
+          dbg("uri: parsed uri pattern: $re");
+          dbg("uri: parsed uri found: $1 in redirector: $proto$host$rest");
+          push (@uris, $1);
+          $found_redirector_match = 1;
+          last;
+        }
       }
+      if (!$found_redirector_match) {
+        # try generic https? check if redirector pattern matching failed
+        # bug 3308: redirectors like yahoo only need one '/' ... <grrr>
+        if ($rest =~ m{(https?:/{0,2}.+)$}i) {
+          push(@uris, $1);
+          dbg("uri: parsed uri found: $1 in hard-coded redirector");
+        }
+      }
 
       ########################
       ## TVD: known issue, if host has multiple combinations of the following,

Reply via email to