https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021
--- Comment #12 from AXB <[email protected]> --- (In reply to Ivo Truxa from comment #6) > (In reply to AXB from comment #5) > > If this is supposed to scale on medium to large sites, imo, the data would > > need to be stored in Redis anyway. > > Good point. Redis handler is welcome, and if there is any around for AWL, it > will work without modifications for TxRep too. However, where Redis is > especially helpful, are busy servers with frequent R/W operations - and the > requirements of AWL are several *orders* lower than at the Bayes plugin (a > single lookup/write per message at AWL vs. hundreds to thousands at Bayes). > So it is much less critical at AWL than at Bayes. > > Also the size of the database, on my mind, should not pose much troubles > even at medium to bigger servers. Let's assume 100 millions of emails per > year. If you expire only messages of senders with a single email during that > period, you get rid of ~90% of the entries (namely most of the spam from > random addresses), the rest will be from more or less regular senders, who > will each require only a single record per 10-100 emails. Hence for the 100M > emails, you will need probably less than 1M database entries (and that's the > high estimate, it could be as few as 100,000 or perhaps even less). > Currently each entry is ~65B (when using MySQL). That makes 65 MB for a year > worth of data (1 million entries). Plus about the same for the index. That's > ridiculously low, and does not justify any aggressive expiry policy. On my > mind the lightest expiry of entries with the count<=1 and lastupdate-now > interval > year or two, is fully sufficient even for bigger servers. > > TxRep, depending on the configuration may need either also 1 entry per email > or up to 12 entries per message (mostly much less though, even at the > heaviest configuration), but that's still far below the requirements of > Bayes, and likely nothing that could pose any problems. In a cluster of spamd boxes, sharing the data is a must. SQL is a speed hog, file based DB/server is not an option so Redis seems the ideal backend. I'd really like to test this but it will have to include Redis support. -- You are receiving this mail because: You are the assignee for the bug.
