https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021

--- Comment #7 from Ivo Truxa <[email protected]> ---
(In reply to Ivo Truxa from comment #6)
> ... That makes 65 MB for a year
> worth of data (1 million entries). Plus about the same for the index. That's
> ridiculously low, and does not justify any aggressive expiry policy. 

In fact I exaggerated in the previous post, since the above mentioned data
volume would represent the yearly increase, not the full volume. It does not
reflect collecting 1 year of data with no expiry at all. In such case the
volume of the database would be much bigger, of course. My apology for the
disinformation. 

On the other hand, I still believe that expiring the database as little as
possible, is only profitable for the functionality. And I still would like to
understand the motivation behind the aggressive expiry policy mentioned
earlier. I'd like to know if it is done because of volume/performance concerns,
or rather because AWL misbehaves when the data is kept too long. The second is
certainly true, because AWL tends to suppress any improvements in SA rules, in
Bayes training, and in manual white/blacklisting. 

If the size of the database was the concern, then with TxRep it may be even
worse. In the minimal configuration (no dual-storage, no message tracking, and
all reputation weights except the EMAIL_IP, set to zero), the requirements of
TxRep are identical to AWL, but in the full configuration, the size of the
TxRep database may be an order above the size of an equivalent AWL database.

In case that the main reason for the rapid expiry was the misbehaving of AWL,
then the expiry can be quietly set much more relaxed, because TxRep reflects
better changes in the system, especially because of its learning capability,
and also because of the aging feature.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to