http://bugzilla.spamassassin.org/show_bug.cgi?id=3439
[EMAIL PROTECTED] changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From [EMAIL PROTECTED] 2004-11-07 22:48 -------
hrm. well, the problem was two fold, but after fixing the code bugs, and
making sure all the hits were
valid FPs ... the results really suck.
There are 4 different ways to get URIs in from HTML parsing, src=, background=,
href=, and action=
(see HTML::html_uri for more details). I setup some test rules for each type,
and one for the total. src
is the best spam source via S/O, but has a very low hit rate. everything else
hits more on ham -- I have
no idea why they do it, but there are newsletters that do this for no apparent
reason: '<a
href="">Copyright</a>' (that was CNET, BTW...) I'm guessing whatever their
macro/rewrite/text vs
html editors are, they don't pay attention to when blank URIs are used.
results from last 90 days, ~120k mails:
0.085 0.0912 0.0151 0.858 1.00 0.01 T_EMPTY_URI_SRC
0.293 0.2596 0.6653 0.281 0.33 0.01 T_EMPTY_URI
0.157 0.1221 0.5443 0.183 0.33 0.01 T_EMPTY_URI_BG
0.055 0.0537 0.0756 0.415 0.00 0.01 T_EMPTY_URI_HREF
0.010 0.0087 0.0302 0.224 0.00 0.01 T_EMPTY_URI_ACTION
The new fix and rules are committed, r56908. We can see how it works for
everyone else, but judging
from my results, this really sucks as a spam sign due to the large number of
legit newsletters which do
this.
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.