Internet Chicago Staff wrote:
> Actually I'm under the impression that the corpus can and does become 
> corrupt.  My inner geek thinks I should look at it on occasion.
>   

I didn't mean to put words in your mouth. I only wanted to fully address 
the situation and the means that it can be done.

You are absolutely right that it can become corrupt, although I prefer 
to refer to it as "polluted".  This pollution is the reason behind my 
previous post regarding using lists like the redRe to prevent certain 
items from entering the corpus.

IMHO, I see corpus pollution as the #1 failure of a majority of 
Bayesian-based solutions I have tried.  Thankfully, ASSP has a number of 
features that allow you to seriously minimize the risk - if you can 
identify what it is that is causing the corpus pollution.

Moving those files around isn't going to solve your problem, because the 
pollution will continue.  You need to address the cause of the pollution 
directly.

If you are in a situation like I have been, the pollution is probably 
being caused by excessive personal email in a business environment.  The 
redRe can save you here with the right Regular Expressions.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to