Ian Zimmerman wrote:
I am very confused by the various features involving expiry from Bayes.
perldoc Mail::SpamAssassin::Conf :
bayes_expiry_max_db_size (default: 150000)
What should be the maximum size of the Bayes tokens
database?
When expiry occurs, the Bayes system will keep either 75% of
the maximum value, or 100,000 tokens, whichever has a larger
value. 150,000 tokens is roughly equivalent to a 8Mb
database file.
bayes_auto_expire (default: 1)
If enabled, the Bayes system will try to automatically
expire
old tokens from the database. Auto-expiry occurs when the
number of tokens in the database surpasses the
bayes_expiry_max_db_size value. If a bayes datastore backend
does not implement individual key/value expirations, the
setting is silently ignored.
bayes_token_ttl (default: 3w, i.e. 3 weeks)
Time-to-live / expiration time in seconds for tokens kept in
a Bayes database. A numeric value is optionally suffixed by
a time unit (s, m, h, d, w, indicating seconds (default),
minutes, hours, days, weeks).
If bayes_auto_expire is true and a Bayes datastore backend
supports it (currently only Redis), this setting controls
deletion of expired tokens from a bayes database. The value
is observed on a best-effort basis, exact timing promises
are
not necessarily kept. If a bayes datastore backend does not
implement individual key/value expirations, the setting is
silently ignored.
This really sounds as if expiry is a no-op for backends other than
Redis. And yet Debian bug #334829 [1] exists, and has spawned a whole
subculture of solutions and work-arounds. (Sorry for the slight
exaggeration.) Clearly the users reporting these problems do not use
Redis, in fact by all signs they use the default DB backend, as I do.
So should I be worried about the expiry overhead and set up a separate
--force-expire job? I am confused.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=334829
The redis backend takes advantage of an auto-expiry mechanism of
key/value pairs as provided by a redis server internally (transparently
and automatically), so with this backend the bayes_token_ttl is the
only setting that matters, and SpamAssassin (auto)expiration runs
are not needed, if fact they are a no-op and should not be used.
With other bayes back-ends the traditional expiration mechanisms
need to be used, either auto-expiration runs triggered from time
to time by SpamAssassin, or explicit expiration runs, e.g. from
a cron job. With these traditional back-ends the bayes_token_ttl
setting has no effect.
and has spawned a whole subculture of solutions and work-arounds
Indeed. These mostly pre-date the availability of a Redis back-end.
Mark