https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8206
Bernhard Lichtinger <bernhard.lichtin...@lrz.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bernhard.lichtin...@lrz.de --- Comment #5 from Bernhard Lichtinger <bernhard.lichtin...@lrz.de> --- (In reply to stephen-spamassassin from comment #3) > I can confirm something like: > <table background [...] > > Results in uri: > background.com > > Which happens to be on a URI blacklist, making for some false-positive spam. Today I was wondering, why a regular newsletter was triggering URIBL_BLACK with "background.com" but there was no such URI in the mail. After some testing I found the coulprit, a malformed html tag: <td class background bgcolor=3D"#F4F7FA" align=3D"center= " valign=3D"top" style=3D"padding: 0 8px;"> There is the "=" missing between "class" and "background". I can reproduce this behaviour with: <td background> <tr background> <body background> <table background> After some searching through the code I stumbled over "sub html_uri" in HTML.pm: sub html_uri { my ($self, $tag, $attr) = @_; # ordered by frequency of tag groups if ($tag =~ /^(?:body|table|tr|td)$/) { if (defined $attr->{background}) { $self->push_uri($tag, $attr->{background}); } } [...] => without the "=" background is treated as an attribute and gets pushed on the uri_list. And then uri_list_canonicalize adds "www." and ".com" to "background". Debug-Log: dbg: uri: canonicalizing html uri: background dbg: uri: cleaned uri: http://background dbg: uri: cleaned uri: http://www.background.com dbg: uri: added host: www.background.com domain: background.com dbg: uri: cleaned uri: background -- You are receiving this mail because: You are the assignee for the bug.