https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8206

Bernhard Lichtinger <bernhard.lichtin...@lrz.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bernhard.lichtin...@lrz.de

--- Comment #5 from Bernhard Lichtinger <bernhard.lichtin...@lrz.de> ---
(In reply to stephen-spamassassin from comment #3)
> I can confirm something like:
> <table background [...]
> 
> Results in uri:
> background.com
> 
> Which happens to be on a URI blacklist, making for some false-positive spam.

Today I was wondering, why a regular newsletter was triggering URIBL_BLACK with
"background.com" but there was no such URI in the mail.

After some testing I found the coulprit, a malformed html tag:

<td class background bgcolor=3D"#F4F7FA" align=3D"center=
" valign=3D"top" style=3D"padding: 0 8px;">

There is the "=" missing between "class" and "background".

I can reproduce this behaviour with:

<td background>
<tr background>
<body background>
<table background>

After some searching through the code I stumbled over "sub html_uri" in
HTML.pm:

sub html_uri {
  my ($self, $tag, $attr) = @_;

  # ordered by frequency of tag groups
  if ($tag =~ /^(?:body|table|tr|td)$/) {
    if (defined $attr->{background}) {
      $self->push_uri($tag, $attr->{background});
    }
  }
[...]

=> without the "=" background is treated as an attribute and gets pushed on the
uri_list.

And then uri_list_canonicalize adds "www." and ".com" to "background".

Debug-Log:
dbg: uri: canonicalizing html uri: background
dbg: uri: cleaned uri: http://background
dbg: uri: cleaned uri: http://www.background.com
dbg: uri: added host: www.background.com domain: background.com
dbg: uri: cleaned uri: background

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to