> If I understand correctly, you could then add a "minimum life"
variable
> that says the file has to be older than so many days before it can be
> deleted.  If the file is not older than "minimum life", it is left
until
> it is older...then deleted.  Or something like that.  So, there would
be
> some extra files in the collections, but I don't imagine they would
hurt
> bayesian too badly seeing as that's how it already is today.

Hmm... that sounds like an idea which was brought on some time ago
(John was still the dev for ASSP at the time); that is, set up some kind
of TTL parameter for corpus files so that the spamdb rebuild should
check the file date/time and if over the TTL (say "n" days) it should
then delete the file.

While at first it may sound like a "cool idea", it has some drawbacks,
especially when it comes to low and high traffic boxes; in the first
case
the spam/notspam folder would quickly "age" and get almost empty;
in the second one is that on a high traffic box it would then be easy to
corrupt the corpus by sending in a bunch of identical messages :P

Bottom line; the bayes filter should work by /learning/ this means that
it should NOT discard the previous data, but rather REFINE them
from further data coming in; so maybe the whole bayes approach
used inside ASSP should be revised NOT to deal just with the latest
data but to learn/improve during time



------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to