Charles Marcus wrote:
Hi Fritz,

I'm only involved in this discussion as an interested bystander. I am 
very curious as to whether or not Michael has stumbled onto one of the 
biggest problems with Bayesian spam filters, and if he has also 
developed a brilliant solution using a mechanism of ASSP.

I don't know if it was "brilliant", but as far as I know it worked to counter-act the problem I saw.

It was the *fact* that more and more spam was getting by ASSP that 
started Michael looking into *why* - and he discovered why, and then 
developed a solution *which* *worked*.

It either worked or by massive coincidence the problem went away.  To see an immediate effect, I had to manually search for and remove messages from the corpus that blocked my criteria.  I did this on a scripted nightly basis until the redlist functionality was recently altered to prevent corpus adding.

I certainly think your opinion has much more weight than anyone else's 
here, and far be it from me to even seriously consider trying to 
challenge you on it, but as a user, this issue does interest me.
  

Whether or not it comes out in my emails, I agree.  I am having a hard time understanding if Fritz simply doesn't believe me and my specific circumstance or doesn't believe that the issue could possibly exist at all.  Either way, I hold a high respect for Fritz and his opinion on this as well.

Certainly this situation doesn't hold true for everyone.  It didn't for me either at previous companies that I set up ASSP for - but certainly the situation *can* exist.

So, what would you suggest as the best approach for Michael (and others) 
to deal with this particular, peculiar situation?
  

That I would also like to know.  Its a fact that for some people at certain installations, Bayesian scoring can become useless.  This is typically from improper whiltelisting (such as not redlisting auto-replies etc).  I have been running ASSP for years, and I know how to take those precautions.  At my current location, I restarted ASSP's database (3) times, running it with subjects as file names so I could visually determine why my corpus continued to degrade each time. I also closely monitored all my lists (especially the whitelist) for improper listings.

Removing (and preventing) what I saw as "pollution" was the only means I could come up with that was effective at preventing corpus degradation.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to