http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5725





------- Additional Comments From [EMAIL PROTECTED]  2007-11-18 10:22 -------
I think at a casual glance it might look like a good idea, but I'm not so sure
after spending a bit of time thinking about it. Consider the following points.

1) Implementing this would be "expensive" in that you'd need a database to track
the subjects, probably with some form of atime-based expiry like the bayes
system. This isn't really a case against doing it, but does raise the bar for
how effective it should be. We don't want to be spending a lot of time coding a
feature or occupying disk space with databases unless it's going to be really
effective.

2) Subjects repeat a lot, not just in spam. Consider mailing lists like the
spamassassin-users. Just this month there were 25 "Re: It's a fine line..."
subjects (26 if you count the first one without the Re:). Also Consider
subscriber newsletters and notifications. Every month I get a lot of emails such
as "Your bill is now available online" (verizon) "Your M&T E-statement is now
available REF#:xxxxxxx" (my bank, and the reference is always the same, I have 8
of them on hand to check against..) I get these *every* month, and over time the
count piles up. This gets even worse if you consider sysadmin reporting tools
like nagios, which can bombard you with dozens of the same subject a day if part
of your network keeps going up and down.

3) Spammers could easily evade such a system by randomizing subjects if it was
exact match based. They already randomize body text, so this would be trivial.
If it's not exact-match, see 4.

4) SA's existing bayes system already tokenizes subject lines, which has this
same effect, but on a trained basis, not on a counted basis.

Overall, I'm not sure this is really worth it. It would be difficult to find a
variation of this idea that isn't a duplication of bayes per #4, that's
effective against randomization per #2, doesn't cause FPs per #3, and is
effective enough to be worth it as per #1.

I like that you're submitting ideas, and encourage you to keep doing so, I just
don't think this one would work out in a broader reality.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to