Internet Chicago Staff wrote: > Actually I'm under the impression that the corpus can and does become > corrupt. My inner geek thinks I should look at it on occasion. >
I didn't mean to put words in your mouth. I only wanted to fully address the situation and the means that it can be done. You are absolutely right that it can become corrupt, although I prefer to refer to it as "polluted". This pollution is the reason behind my previous post regarding using lists like the redRe to prevent certain items from entering the corpus. IMHO, I see corpus pollution as the #1 failure of a majority of Bayesian-based solutions I have tried. Thankfully, ASSP has a number of features that allow you to seriously minimize the risk - if you can identify what it is that is causing the corpus pollution. Moving those files around isn't going to solve your problem, because the pollution will continue. You need to address the cause of the pollution directly. If you are in a situation like I have been, the pollution is probably being caused by excessive personal email in a business environment. The redRe can save you here with the right Regular Expressions. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Assp-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-user
