https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155
--- Comment #42 from Daryl C. W. O'Shea <[email protected]> 2009-09-22 20:12:25 PDT --- I've uploaded my results, but they don't have bayes enabled. Why, again, aren't we reusing bayes results? I've kicked off another round with bayes enabled (my net enabled check took 13.4 hours), I'm waiting on timing to see how long it'll take. I may have to setup a SQL server on the cluster to do it in a reasonable amount of time. In any case, I don't think we have enough message results contributed yet for a good scoreset. We have way less than for 3.2.0, although from a larger number of contributors. Is there any chance we might see results from Theo? (In reply to comment #15) > Should I bother to continue recruiting more masscheck participants after this > rescore? I would. A larger number of people submitting from *clean* corpora will allow us to provide updated scores more often. As it is now the scores I'm generating now (well broken right now, but I'll fix it soon) swing quite a bit. I suspect it's due too not enough submitters and not enough messages. (In reply to comment #17) > > the base ruleset (non-sandbox rules) won't change scores, so this is > > important. > > For nightly masschecks, the only scores affected will be those of sandbox > > rules. So only about 1/2 of the ruleset, I'd reckon. > > I am curious, do you remember the original reason for this design decision? I felt that we didn't have a large enough nightly/weekly corpus to reliable change all of the scores. I could generate two versions of the scores... with and without locking the base set of scores. > Might there be value in making the entire ruleset scores affected by nightly > masshecks? I think we need a larger nightly/weekly corpus before we do this. (In reply to comment #18) > iirc, the risk is that a small set of corpora (e.g. a few people take a week > off) could cause the entire ruleset to be skewed incorrectly. This way at > least only the most recent (sandbox) rules would be affected, so it's a bit > safer. Even when all of the regular contributors submitted their results the corpus wasn't that large, so I didn't want to throw away the scores based on the much large corpus we had for 3.2.0 > It's also faster to generate the scores, but this isn't so much of an issue > now, as our main machine is quite beefy... I can do it either way... cycles wasn't an issue. > There may have been other reasons, too, but I can't find the mails :( I probably only sent one about the topic. Some terse comments in the commit messages for that code. (In reply to comment #25) > Daryl, is there a URL to your weekly scores? Still a little broken on my end, but: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/ -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
