Thanks for the reply, it is really helpful. I'm experiencing much better results with HMM set to monitor, I will continue to test. The Bayesian filters seems to be doing very well on its own, but I'll investigate the corpus further as you recommend. I'll have to switch back to subject name logging first. I do appreciate this! Miles
-----Original Message----- From: K Post [mailto:nntp.p...@gmail.com] Sent: 29 May 2015 15:32 To: For Users of ASSP Subject: Re: [Assp-user] I think I'm doing something wrong I agree with your analysis of the problem. Why it's happening, I can't say - I'm sure Thomas will chime in, but in the interim, if you're not fining HMM reliable (please check more than just the 1 message) consider turning it to monitor mode instead of scoring/blocking so that the HMM inaccuracies don't cause poor results, at least that's what I would do. I went through this when first turning on HMM. Turns out that the corpus wasn't great, an bayesian wasn't perfect either. I use subject name logging which helps my identify incorrectly classified messages in the corpus. I sorted the spam, not-spam, and errors folders by name. Then I eyeballed the filenames, looking for obvious errors. I also used messages from ok mail an discarded that I manually reviewed and moved to spam/not-spam to help learn. It's a tedious process, but necessary IMO when the corpus is too far out of whack. I also run a block report for all addresses sent to me nightly, and review every single block. Of course this doesn't show bad mail that is delivered - I rely on users to report those, and they're starting to. Big help. Hope this helps you get the server running as it should. On Fri, May 29, 2015 at 9:27 AM, Miles Gaynor <m...@castlehoward.co.uk> wrote: > Folks, > I've noticed a large increase in spam recently, mainly from marketing > types sending me genuine marketing that I don't actually want. I > assumed that was because the messages were convincing but I sent one > to asspanalyze@assp.local<mailto:asspanalyze@assp.local> and this is > the final part of the message. > > > Bayesian Spam Probability: > combined probability: > > 1.00000000 - got 111 - used 60 most significant results > > > ________________________________ > > Hidden-Markov-Model Spam Probability: > combined HMM spam probability: > > 0.0000 - got 44 - used 44 most significant results > > > > If I'm reading that correctly, the Bayesian filter is working fine but > the HMM isn't agreeing with it. > Could you give me a clue? (yes, I'm pretty clueless). > Thank you, > Miles > > > > ________________________________ > Castle Howard Estate is a limited company registered in England and Wales. > Registered Number: 480214. Registered Office: Estate Office, Castle > Howard, York, YO60 7DA. This message is private and confidential. If > you have received this message in error, please notify us and remove > it from your system. > > ---------------------------------------------------------------------- > -------- _______________________________________________ > Assp-user mailing list > Assp-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-user > ------------------------------------------------------------------------------ _______________________________________________ Assp-user mailing list Assp-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-user Castle Howard Estate is a limited company registered in England and Wales. Registered Number: 480214. Registered Office: Estate Office, Castle Howard, York, YO60 7DA. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. ------------------------------------------------------------------------------ _______________________________________________ Assp-user mailing list Assp-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-user