That helps a bit...

I'm using your example config's, the second method, but I only ever get an index of 
0.308, even though I have 9,000+ terms in spam and 3000+ in ham tables.
I also have no counts in messages, and nothing in corpus.
What am I missing?
I've been adding mail using the spam@ notspam@ mailets.
Have I completely missed the trick here?!

d.



> -----Original Message-----
> From: Chris Means [mailto:[EMAIL PROTECTED]]
> Sent: 28 January 2003 15:00
> To: James Users List
> Subject: RE: Spam filtering mailets wanted...[spam 0.308]
> 
> 
> Hi Danny,
> 
> > I'm slightly confused as to config..
> > what is the idea with the two methods..
> >
> >      Method 1:
> >        Load the corpus directly.
> >        This relies on outside processes to build and maintain 
> the corpus.
> >
> >      Method 2:
> >        Load the ham & spam tokens/counts and rebuild the corpus.
> >        This relies on outside processes to maintain the ham & spam
> >        token/counts tables.
> 
> The routines have two "data stages", where their state can be preserved or
> not as desired.
> 
> Stage 1.  Email messages (Ham & Spam) need to be analyized to determine
> token counts.
> 
> Stage 2.  Those token counts are used to build the corpus.
> 
> If you're keeping a repository of the raw email messages (both Ham & Spam)
> then you could rebuild the token counts from scratch each time you rebuild
> the corpus.  This saves needing and maintaining the *_ham, *_spam, and
> *_messagecounts tables.  It would take longer to perform the analysis, but
> it could be performed at any time, and even on a different 
> machine than the
> mail server, but it wouldn't have to spend the processing time to maintain
> the additional tables.
> 
> If however, you're not keeping a repository of the raw email 
> messages, or if
> the admin wants to be able to dynamically maintain the corpus (updating it
> daily or something), then Method 2 would be better.  This would allow
> messages to be flagged by the user as SPAM and immediately update the
> corpus.  This method would allow users/administrators to 
> potentially quickly
> stop new forms of SPAM getting past the blockers, and to use a variety of
> mechanisms for adding new Spam & Ham messages.
> 
> Does that help explain the two methods a little better?
> 
> -Chris
> 
> 
> 
> --
> To unsubscribe, e-mail:   
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to