-------- Original-Nachricht --------
> Datum: Thu, 5 Feb 2009 17:07:28 +0100
> Von: "Jehan Pagès" <[email protected]>
> An: [email protected]
> Betreff: Re: [Dspam-user] spam_train does not work

> Hi,
> 
> On Thu, Feb 5, 2009 at 4:36 PM, Steve <[email protected]> wrote:
> 
> > > PValue markov
> > >
> > Hmmm... it's some time ago that I have modified the DSPAM Gentoo ebuild
> but
> > if memory serves me right, then I have added a big, fat warning about
> using
> > MARKOV with any thing else then hash driver. Please change PValue to
> another
> > probability algorithm or switch from MySQL storage to hash.
> >
> 
> There is no such warning. There is only:
>  # Don't mess with this unless you know what you're doing.
> For my own, I don't really know what I am doing :p, and I don't really
> remember whether I changed it or not, but if I did, it was not on my own
> "intelligence", but following some advice on one of the many tutorials I
> found, or reading articles, or other...
> 
> Anyway a clearer message like the one you are writing here could be worth
> adding back to the default Gentoo ebuild's conf file.
> 
> Now I am making a training test again... the stats don't change much here
> strangely enough (after training a big corpus from the spamassasin public
> corpus. So when are the "SC Spam Corpusfed" and "NC Nonspam Corpusfed"
> increasing if not during a dspam_train?).
>
btw: I do only SC/NC training over here. For that I have a modified training 
script doing the SC/NC training and doing TONE (train near on on error). The 
result is that training now takes 1/2 up to 1/3 of normal training time and the 
data is way, way, way smaller then with normal training. Just the other week I 
helped another Gentoo user with training of his DSPAM installation. Result was 
that he got down from 483.33M of data (normal training) down to 47.79M data. 
The training was slightly faster but not so significant as on my setup (keep in 
mind that I have a significantly rewritten storage driver for MySQL on one of 
my setups).
Checking accuracy on a corpus he has never trained before resulted in 99.x% and 
on another corpus it resulted in 98.x%. I am not 100% sure about the accuracy 
since I can not find his mail with the real numbers but I remember one to be 
99.9 something and the other > 98.5 something. Anyway... the accuracy was 
pretty high considering that he never has trained mails from that corpus and 
considering that it was an very messy and error prone corpus (spamarchive.org 
submit and autosubmit). 


> But the training output looks
> much
> more interesting now! Not all messages are passing as non-spam as it did
> before (I had 0 true positive, now I have 478 of them!).
> 
> So things look better, thanks very much! I will now wait a day or so to
> check that some spams are really caught and sent to the Junk/ folder of my
> mailbox as expected in my dspam configuration. If this works, I will be
> happy and report it here. Then I will take care of the next issue:
> understanding how to make a single shared dspam group for all users...
> Thanks again.
> 
> Jehan
>
Steve


-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a

------------------------------------------------------------------------------
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to