On 03/30, Adam Katz wrote: > Be careful about measuring the usefulness of that data; you'll have to > measure samples against each other, and even then you will have > imperfect results.
If this ever gets added to the mass-check tests, I'll be more than happy to create a separate set of the data based only on data from people who are not contributing to mass-checks. Right now, I only have data from 1796 emails that aren't run through mass-check, so it's not worth it. But I'm keeping all input data separated by who contributed it, so a special version for mass-check folks will be easy. I just posted some test results to the users list that I'm pretty happy with. I'd really like to get more data though. Graph of the results: http://www.chaosreigns.com/iprep/results.svg Based on training on all corpora except mine, and then training on mine 1 spam and 1 ham at a time, calculating the accuracy at each step using a separate test set of my email. 3 sets of lines from 3 runs using randomly selected training and scoring sets. Project web page: http://www.chaosreigns.com/iprep/ -- "I don't want to die... just yet... not while there's... women." - J. Matthew Root, 8/23/02 (http://www.jmrart.com/) http://www.ChaosReigns.com
