Charles Marcus wrote:
Hi Fritz, I'm only involved in this discussion as an interested bystander. I am very curious as to whether or not Michael has stumbled onto one of the biggest problems with Bayesian spam filters, and if he has also developed a brilliant solution using a mechanism of ASSP. I don't know if it was "brilliant", but as far as I know it worked to counter-act the problem I saw. It was the *fact* that more and more spam was getting by ASSP that started Michael looking into *why* - and he discovered why, and then developed a solution *which* *worked*. It either worked or by massive coincidence the problem went away. To see an immediate effect, I had to manually search for and remove messages from the corpus that blocked my criteria. I did this on a scripted nightly basis until the redlist functionality was recently altered to prevent corpus adding. I certainly think your opinion has much more weight than anyone else's here, and far be it from me to even seriously consider trying to challenge you on it, but as a user, this issue does interest me. Whether or not it comes out in my emails, I agree. I am having a hard time understanding if Fritz simply doesn't believe me and my specific circumstance or doesn't believe that the issue could possibly exist at all. Either way, I hold a high respect for Fritz and his opinion on this as well. Certainly this situation doesn't hold true for everyone. It didn't for me either at previous companies that I set up ASSP for - but certainly the situation *can* exist. So, what would you suggest as the best approach for Michael (and others) to deal with this particular, peculiar situation? That I would also like to know. Its a fact that for some people at certain installations, Bayesian scoring can become useless. This is typically from improper whiltelisting (such as not redlisting auto-replies etc). I have been running ASSP for years, and I know how to take those precautions. At my current location, I restarted ASSP's database (3) times, running it with subjects as file names so I could visually determine why my corpus continued to degrade each time. I also closely monitored all my lists (especially the whitelist) for improper listings. Removing (and preventing) what I saw as "pollution" was the only means I could come up with that was effective at preventing corpus degradation. |
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________ Assp-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-user
