https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021

--- Comment #12 from AXB <[email protected]> ---
(In reply to Ivo Truxa from comment #6)
> (In reply to AXB from comment #5)
> > If this is supposed to scale on medium to large sites, imo, the data would
> > need to be stored in Redis anyway.
> 
> Good point. Redis handler is welcome, and if there is any around for AWL, it
> will work without modifications for TxRep too. However, where Redis is
> especially helpful, are busy servers with frequent R/W operations - and the
> requirements of AWL are several *orders* lower than at the Bayes plugin (a
> single lookup/write per message at AWL vs. hundreds to thousands at Bayes).
> So it is much less critical at AWL than at Bayes.
> 
> Also the size of the database, on my mind, should not pose much troubles
> even at medium to bigger servers. Let's assume 100 millions of emails per
> year. If you expire only messages of senders with a single email during that
> period, you get rid of ~90% of the entries (namely most of the spam from
> random addresses), the rest will be from more or less regular senders, who
> will each require only a single record per 10-100 emails. Hence for the 100M
> emails, you will need probably less than 1M database entries (and that's the
> high estimate, it could be as few as 100,000 or perhaps even less).
> Currently each entry is ~65B (when using MySQL). That makes 65 MB for a year
> worth of data (1 million entries). Plus about the same for the index. That's
> ridiculously low, and does not justify any aggressive expiry policy. On my
> mind the lightest expiry of entries with the count<=1 and lastupdate-now
> interval > year or two, is fully sufficient even for bigger servers.
> 
> TxRep, depending on the configuration may need either also 1 entry per email
> or up to 12 entries per message (mostly much less though, even at the
> heaviest configuration), but that's still far below the requirements of
> Bayes, and likely nothing that could pose any problems.

In a cluster of spamd boxes, sharing the data is a must.
SQL is a speed hog, file based DB/server is not an option so Redis seems the
ideal backend. 

I'd really like to test this but it will have to include Redis support.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to