https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021

--- Comment #6 from Ivo Truxa <[email protected]> ---
(In reply to AXB from comment #5)
> If this is supposed to scale on medium to large sites, imo, the data would
> need to be stored in Redis anyway.

Good point. Redis handler is welcome, and if there is any around for AWL, it
will work without modifications for TxRep too. However, where Redis is
especially helpful, are busy servers with frequent R/W operations - and the
requirements of AWL are several *orders* lower than at the Bayes plugin (a
single lookup/write per message at AWL vs. hundreds to thousands at Bayes). So
it is much less critical at AWL than at Bayes.

Also the size of the database, on my mind, should not pose much troubles even
at medium to bigger servers. Let's assume 100 millions of emails per year. If
you expire only messages of senders with a single email during that period, you
get rid of ~90% of the entries (namely most of the spam from random addresses),
the rest will be from more or less regular senders, who will each require only
a single record per 10-100 emails. Hence for the 100M emails, you will need
probably less than 1M database entries (and that's the high estimate, it could
be as few as 100,000 or perhaps even less). Currently each entry is ~65B (when
using MySQL). That makes 65 MB for a year worth of data (1 million entries).
Plus about the same for the index. That's ridiculously low, and does not
justify any aggressive expiry policy. On my mind the lightest expiry of entries
with the count<=1 and lastupdate-now interval > year or two, is fully
sufficient even for bigger servers.

TxRep, depending on the configuration may need either also 1 entry per email or
up to 12 entries per message (mostly much less though, even at the heaviest
configuration), but that's still far below the requirements of Bayes, and
likely nothing that could pose any problems.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to