Ian Zimmerman wrote:

I am very confused by the various features involving expiry from Bayes.

perldoc Mail::SpamAssassin::Conf :

       bayes_expiry_max_db_size      (default: 150000)

What should be the maximum size of the Bayes tokens database?
           When expiry occurs, the Bayes system will keep either 75% of
           the maximum value, or 100,000 tokens, whichever has a larger
           value.  150,000 tokens is roughly equivalent to a 8Mb
           database file.

       bayes_auto_expire             (default: 1)

If enabled, the Bayes system will try to automatically expire
           old tokens from the database.  Auto-expiry occurs when the
           number of tokens in the database surpasses the
           bayes_expiry_max_db_size value. If a bayes datastore backend
           does not implement individual key/value expirations, the
           setting is silently ignored.

       bayes_token_ttl               (default: 3w, i.e. 3 weeks)

           Time-to-live / expiration time in seconds for tokens kept in
           a Bayes database.  A numeric value is optionally suffixed by
           a time unit (s, m, h, d, w, indicating seconds (default),
           minutes, hours, days, weeks).

           If bayes_auto_expire is true and a Bayes datastore backend
           supports it (currently only Redis), this setting controls
           deletion of expired tokens from a bayes database. The value
is observed on a best-effort basis, exact timing promises are
           not necessarily kept. If a bayes datastore backend does not
           implement individual key/value expirations, the setting is
           silently ignored.

This really sounds as if expiry is a no-op for backends other than
Redis.  And yet Debian bug #334829 [1] exists, and has spawned a whole
subculture of solutions and work-arounds.  (Sorry for the slight
exaggeration.)  Clearly the users reporting these problems do not use
Redis, in fact by all signs they use the default DB backend, as I do.
So should I be worried about the expiry overhead and set up a separate
--force-expire job?  I am confused.
  [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=334829

The redis backend takes advantage of an auto-expiry mechanism of
key/value pairs as provided by a redis server internally (transparently
and automatically), so with this backend the bayes_token_ttl is the
only setting that matters, and SpamAssassin (auto)expiration runs
are not needed, if fact they are a no-op and should not be used.

With other bayes back-ends the traditional expiration mechanisms
need to be used, either auto-expiration runs triggered from time
to time by SpamAssassin, or explicit expiration runs, e.g. from
a cron job. With these traditional back-ends the bayes_token_ttl
setting has no effect.

and has spawned a whole subculture of solutions and work-arounds

Indeed. These mostly pre-date the availability of a Redis back-end.

  Mark

Reply via email to