https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021
--- Comment #7 from Ivo Truxa <[email protected]> --- (In reply to Ivo Truxa from comment #6) > ... That makes 65 MB for a year > worth of data (1 million entries). Plus about the same for the index. That's > ridiculously low, and does not justify any aggressive expiry policy. In fact I exaggerated in the previous post, since the above mentioned data volume would represent the yearly increase, not the full volume. It does not reflect collecting 1 year of data with no expiry at all. In such case the volume of the database would be much bigger, of course. My apology for the disinformation. On the other hand, I still believe that expiring the database as little as possible, is only profitable for the functionality. And I still would like to understand the motivation behind the aggressive expiry policy mentioned earlier. I'd like to know if it is done because of volume/performance concerns, or rather because AWL misbehaves when the data is kept too long. The second is certainly true, because AWL tends to suppress any improvements in SA rules, in Bayes training, and in manual white/blacklisting. If the size of the database was the concern, then with TxRep it may be even worse. In the minimal configuration (no dual-storage, no message tracking, and all reputation weights except the EMAIL_IP, set to zero), the requirements of TxRep are identical to AWL, but in the full configuration, the size of the TxRep database may be an order above the size of an equivalent AWL database. In case that the main reason for the rapid expiry was the misbehaving of AWL, then the expiry can be quietly set much more relaxed, because TxRep reflects better changes in the system, especially because of its learning capability, and also because of the aging feature. -- You are receiving this mail because: You are the assignee for the bug.
