Justin Mason <[EMAIL PROTECTED]> writes:

> sounds good to me.  One thing - could you run some tests on the
> sampling so we can see how reliable it is, in terms of
> hit-frequencies?  I'd like to get a "sanity check" on that, it's a key
> aspect.

Yes.  I will do so.  I'm going to compare auto-learning with
sample-learning using a similar percentage of messages learned.  It
should be easy enough to get it to a similar error rate if it's too
good.

Another thing I want to do is make it deterministic instead of using
rand(100).  If I change it to "learn 1 in N" instead of a percentage,
then I can easily do a mod on the md5sum of the id and/or date.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Reply via email to