On 2015-12-18 11:19, Paul Stead wrote:
After the messages last night I've been looking into the
redirector_pattern config option - I'm seeing weird results...
Given the redirector_pattern of:
redirector_pattern m'https?://www.google.com/url?q=([^&]+).*'i
I've noticed that spamassassin can sometimes miss - I don't think it's
to do with the regex above as I've tried multiple variations - I've
made
it reproducible but I'm not sure why this is happening:
p1: http://pastebin.com/w93ZZp9h
p2: http://pastebin.com/p2pciGC3
[...]
# spaspamassassin -D -t < p2 2>&1 | grep baddomain
p2 doesn't pick up on baddomain.com
Any thoughts or have I stumbled upon a problem?
Two problems there, one is in your regexp, the other is in
the SpamAssassin logic of dealing with redirects.
The parameter of redirector_pattern is a regular expression,
dots and a question mark have a special meaning in a regexp.
If this special meaning is not wanted, they need to be quoted.
Also, SpamAssassin does not add anchoring to a redirector_pattern
regexp, so you need to add it yourself when needed.
And the .* at the end is redundant for this reason.
So it should be something like:
redirector_pattern m{^https?://www\.google\.com/url\?q=([^&]+)}i
The other problem is in
Mail::SpamAssassin::Util::uri_list_canonicalize()
near line 1346:
# deal with http redirectors. strip off one level of redirector
# and add back to the array. the foreach loop will go over those
# and deal appropriately.
# bug 3308: redirectors like yahoo only need one '/' ... <grrr>
if ($rest =~ m{(https?:/{0,2}.+)$}i) {
push(@uris, $1);
}
# resort to redirector pattern matching if the generic https?
check
# doesn't result in a match -- bug 4176
else {
foreach (@{$redirector_patterns}) {
Note that it tries the hard-coded check first, and skips
evaluating redirector patterns when the hard-coded match
was successful.
So your redirector pattern was not tried at all, and at the same time
the hard-coded check obtained an invalid URL: from an URL
"/url?q=http://baddomain.com&t=1"
it collected as "http://baddomain.com&t=1" instead of
"http://baddomain.com"
The URI syntax after a '?' should treat "&t=1" as a second argument
and not glue it with the first argument "http://baddomain.com".
So the resulting invalid URL "http://baddomain.com&t=1"
does not match any URI rules for baddomain.com .
Please try the attached patch (to 3.4.1 or trunk), it reverses
the order of checks: tries redirector_patterns first, and only
falls back to a sloppy hard-coded check as a last resort.
(and that hard-coded check would better be fixed too)
... and please open a bug report in bugzilla.
Mark
Index: lib/Mail/SpamAssassin/Util.pm
===================================================================
--- lib/Mail/SpamAssassin/Util.pm (revision 1720791)
+++ lib/Mail/SpamAssassin/Util.pm (working copy)
@@ -1342,24 +1342,28 @@
# deal with http redirectors. strip off one level of redirector
# and add back to the array. the foreach loop will go over those
# and deal appropriately.
- # bug 3308: redirectors like yahoo only need one '/' ... <grrr>
- if ($rest =~ m{(https?:/{0,2}.+)$}i) {
- push(@uris, $1);
- }
- # resort to redirector pattern matching if the generic https? check
- # doesn't result in a match -- bug 4176
- else {
- foreach (@{$redirector_patterns}) {
- if ("$proto$host$rest" =~ $_) {
- next unless defined $1;
- dbg("uri: parsed uri pattern: $_");
- dbg("uri: parsed uri found: $1 in redirector: $proto$host$rest");
- push (@uris, $1);
- last;
- }
- }
+ # try redirector pattern matching first
+ # (but see also bug 4176)
+ my $found_redirector_match;
+ foreach my $re (@{$redirector_patterns}) {
+ if ("$proto$host$rest" =~ $re) {
+ next unless defined $1;
+ dbg("uri: parsed uri pattern: $re");
+ dbg("uri: parsed uri found: $1 in redirector: $proto$host$rest");
+ push (@uris, $1);
+ $found_redirector_match = 1;
+ last;
+ }
}
+ if (!$found_redirector_match) {
+ # try generic https? check if redirector pattern matching failed
+ # bug 3308: redirectors like yahoo only need one '/' ... <grrr>
+ if ($rest =~ m{(https?:/{0,2}.+)$}i) {
+ push(@uris, $1);
+ dbg("uri: parsed uri found: $1 in hard-coded redirector");
+ }
+ }
########################
## TVD: known issue, if host has multiple combinations of the following,