https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021
--- Comment #6 from Ivo Truxa <[email protected]> --- (In reply to AXB from comment #5) > If this is supposed to scale on medium to large sites, imo, the data would > need to be stored in Redis anyway. Good point. Redis handler is welcome, and if there is any around for AWL, it will work without modifications for TxRep too. However, where Redis is especially helpful, are busy servers with frequent R/W operations - and the requirements of AWL are several *orders* lower than at the Bayes plugin (a single lookup/write per message at AWL vs. hundreds to thousands at Bayes). So it is much less critical at AWL than at Bayes. Also the size of the database, on my mind, should not pose much troubles even at medium to bigger servers. Let's assume 100 millions of emails per year. If you expire only messages of senders with a single email during that period, you get rid of ~90% of the entries (namely most of the spam from random addresses), the rest will be from more or less regular senders, who will each require only a single record per 10-100 emails. Hence for the 100M emails, you will need probably less than 1M database entries (and that's the high estimate, it could be as few as 100,000 or perhaps even less). Currently each entry is ~65B (when using MySQL). That makes 65 MB for a year worth of data (1 million entries). Plus about the same for the index. That's ridiculously low, and does not justify any aggressive expiry policy. On my mind the lightest expiry of entries with the count<=1 and lastupdate-now interval > year or two, is fully sufficient even for bigger servers. TxRep, depending on the configuration may need either also 1 entry per email or up to 12 entries per message (mostly much less though, even at the heaviest configuration), but that's still far below the requirements of Bayes, and likely nothing that could pose any problems. -- You are receiving this mail because: You are the assignee for the bug.
