https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386
Darxus <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #3 from Darxus <[email protected]> 2011-10-28 17:02:59 UTC --- Current corpora limits for score generation are: Ham: 6 years. Spam: 2 months. So, we should reduce the limit for ham? To what? Score generation has a threshold of a minimum of 150,000 hams. The 150,000th newest ham submitted on 2011-10-22 (which includes the bb corpora) was dated: Tue Apr 17 09:33:16 UTC 2007. About 4.6 years. 29.8% of the ham currently used in score generation is from 2008 or older, from jm's corpus. So I think it's important to fix the problem with adding new masscheck accounts, and get more data from more people. It looks like the place to change this limit is rulesrc/sandbox/dos/new-rule-score-gen/generate-new-scores, arguments to log-grep-recent: 172:masses/log-grep-recent -m 72 ../corpus/usable-corpus-set$SCORESET/ham-*.log > masses/ham-full.log 173:masses/log-grep-recent -m 2 ../corpus/usable-corpus-set$SCORESET/spam-*.log > masses/spam-full.log And ruleqa should be changed to match: masses/rule-qa/reports-from-logs 36:my $OLDEST_HAM_WEEKS = 72 * 4; # 72 months = 6 years 37:my $OLDEST_SPAM_WEEKS = 2 * 4; # 2 months -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
