https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6879

            Bug ID: 6879
           Summary: Bayes storage module for Redis
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Learner
          Assignee: [email protected]
          Reporter: [email protected]
    Classification: Unclassified

Here's an experimental Redis storage module for Bayes.

I know there is another one out there
(http://sourceforge.net/projects/bayesredis/) but it's incomplete and decided
to do from scratch anyway.

Token expiration is implemented with Redis internal TTL. I believe it's more
simple and elegant, what's the point in trying to keep exact number of tokens?
TTL can be tuned and Redis has a "maxmemory" setting for failsafe.

It has been trialing for a while now on few servers up to a very busy 20
million token instance (for which memory usage is around 3-10GB,
backup/dumpfile is tiny 250MB). Memory usage would probably be less if we used
hashes (zipmaps etc), but they cannot use TTL so secondary indexes and SA
involvement would be required.. something still to think about.

Development was done on Redis 2.4, but 2.6 is already released with new
features like lua scripting, haven't had a chance to look if there's something
useful.

No "multiuser" support right now, only global. I don't know if it's even needed
or feasible with the memory requirements.

I've temporarily hijacked few existing config variables for simple testing
(only BayesStorage/Redis.pm needed).

  bayes_sql_dsn

    Optional config parameters sent as is to Redis->new().
    Example: server=localhost:6379;password=foo
    By default encoding=undef is set as suggested by Redis module.

    To use non-default database id, use "database=x". This is not passed
    to new(), but specially handled to call Redis->select($id).

  bayes_expiry_max_db_size

    Controls token/seen expiry (ttl value in SECONDS, sent as is to Redis).
    Default 150000 (41 hours) is sane (that's why we abuse this variable),
    but you should try atleast 604800 (1 week).

Any comments or testing welcome..

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to