https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021

--- Comment #4 from Ivo Truxa <[email protected]> ---
(In reply to Kevin A. McGrail from comment #1)
> Add this to cron: 
> 
> DELETE FROM awl WHERE lastupdate <= (now() - INTERVAL 15 day) and count < 5;
> DELETE FROM awl WHERE lastupdate <= (now() - INTERVAL 30 day) and count < 10;
> DELETE FROM awl WHERE lastupdate <= (now() - INTERVAL 60 day) and count < 20;
> DELETE FROM awl WHERE lastupdate <= (now() - INTERVAL 120 day);

BTW, not that I would be against the expiry as such, but wonder what is the
motivation to expire AWL records in such aggressive way as shown above? If
someone is removing all senders with less than 5 messages in the last 15 days,
and removing absolutely everything older than 4 months, then I wonder whether
the AWL system can be of any help for him at all. In fact the system then only
keeps rather recent records of senders mailing almost daily. It means either
regular spammers or hammers who'd be probably better white/blacklisted on a
more consistent way (manually, tuned rules, Bayes). I may be wrong, but for me
the ability of AWL to prevent an occasional good sender hitting a false
positive is more important than handling regular senders, where Bayes, rules,
and white/blacklist are already certainly tuned to handle them correctly. So by
continuously dumping practically the entire AWL database, you are losing a
significant part of the functionality AWL provides.

Perhaps at servers handling millions of emails monthly, the size of the
database and its performance become an issue, but unless it is really the case,
I would rather advise keeping the records as long as possible. Finally,
customers or friends mailing back after a few years are nothing exceptional,
and it is especially them, who can fall victims of false positives (rules
change over time), and where their recorded score would help.

I can see a possible reason for expiring all records after 120 days perhaps in
trying to make AWL better adapt to continuously changing rules. This is
unneeded with TxRep, because it works differently. First of all it has the
ability to learn messages, and also auto-learning is available, hence clear
spam/ham can be learned or relearned anytime the rules change. And then, new
messages are always stored with higher weight than old ones, meaning that the
influence of old messages vanishes over time automatically, without the need to
delete any records.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to