Re: [Dspam-user] Increase Spam Hit Rate

Stevan Bajić Wed, 18 Apr 2012 14:25:51 -0700

On 18.04.2012 22:38, Steve Fatula wrote:


    *From:* Bradley Giesbrecht <bradley.giesbre...@gmail.com>
    *To:* Steve Fatula <compconsult...@yahoo.com>
    *Cc:* Dspam List <dspam-user@lists.sourceforge.net>
    *Sent:* Wednesday, April 18, 2012 3:04 PM
    *Subject:* Re: [Dspam-user] Increase Spam Hit Rate

    I can't help you other then to point out that you may have missed
    the two replies prior to the one you responded to, both of which
    suggested switching from 'TrainingMode TEFT' to 'TrainingMode TOE'.

Ok, so, what I had tried to ask for and was hoping to get was somesort of explanation as to why this might be the case when so many useTEFT, yet, I should try TOE.

English is not my native language and explaining something so technicalin English instead of my native language is not always as easy as onemight thing. But here we go... I will try to explain what TEFT is andwhy TOE is better.

TEFT stands for 'train everything'. Most users will tell you that itstands for 'train every fucking time'.

TOE stands for 'train on error'.

Allow me to ask how you self learn? I mean you Steve Fatula. How do youlearn? How did you learn as kid that 2 plus 2 is equal to 4? Probablyyou learned it once and since then it was in your mind. Probably in thebeginning you learned it symbolically that the 'picture' '2+2' is '4'.And later you learned that '+' is an addition and you learned how that'+' is working and you learned the numbers and after you learned thatlogic/mechanism you where able to sum almost any number with almost anyother number. Right?

After that you did not needed any more to learn how addition worked.Right? Until that moment where some one told you to sum '(-20) + 3'. Youprobably got it wrong in the first place and then you learned how to doit right. Until someone asked you to sum '(-10)+(-30)'. Probably you gotthat wrong in the first place too and then you learned how to summultiple negative numbers and after that you where able to master thattoo. Right?

This all above stands for the way how TOE works. It learns and is happytill it makes an error and then it learns from its own errors.

TEFT on the other hand is learning constantly. Even the right answers.It learns and learns and learns and learns.

One could now say that since TEFT is learning constantly that it isimproving constantly. But this is not the case. Learning is good butTEFT is very easy over learning. In the beginning every one wasthinking: more learning = more catching spam

But today we know that this is not right. Imagine your brain would workthe same then you would almost not be able to exist. You would even notbe able to just read this message here. Instead of just reading theletters and words you would LEARN the letters and words. Yeah. Learnthem and read them. The same learning as you did when you where a kidand first started to learn read.

Another aspect of TEFT that is bad is the fact that most users never orrarely train. So since TEFT is constantly learning it will constantlylearn WRONG things if the users don't correct. Allow me to explain:


1) 1+1=2
2) 1+2=3
3) 2*3=6
4) 7-2=4
5) 3*3=4

1, 2 and 3 are mathematically correct while 4 and 5 are wrong.

Now lets say that a user is running TEFT and that DSPAM is saying forall the messages (1 to 5) that they are mathematically correct. So inthat case DSPAM would relearn 1, 2 and 3 (making that response stronger)and learn WRONGLY that '7-2=4' and that '3*3=4'.

Now lets say that a user is running TOE and that DSPAM is saying for allthe messages (1 to 5) that they are mathematically correct. In that caseDSPAM would NOT LEARN ANYTHING. It will not wrongly learn anything.


This is a huge difference!

I real world most user are very lousy trainer. So even if they don'ttrain they constantly are making their token data less accurate if theyuse TEFT. Would they have used TOE then they would not learned wrongthings. Their data would get less accurate too since their TP/TN countwould increase with each message but the decrease in accuracy would havebe less accelerated as it is with TEFT.


Do you understand this?

Ahh... and TEFT is constantly producing either new tokens or increasingthe count (ham/spam) for tokens. TOE is not doing that. So in the longrun TOE produces less data and still is more accurate than TEFT. TEFT isa brutal way of learning while TOE is more intelligent.

Now you will ask me why don't all the other use TOE and why does notDSPAM set TOE as default? Well the second one is easy to answer: TEFTwas the default in the past (with CHAIN) and our release manger does notlike us to change old defaults. For the first question: Most people justfollow some how-to they find on the net, without even knowing what theydo. And on small (very small) installs TEFT is producing very quicklyresults while TOE can take some time to kick in. But this is with CHAIN.Users using something like OSB don't suffer from this as much as usersusing CHAIN.

Otherwise, it sounded more like a guess. Perhaps, there is no way toknow what mode should be used. If there was though, was hoping forsome sort of reason or methodology. I had suggested / asked ifperhaps, it was due to having a low rate of spam, percent wise. Thiswas my attempt to put some reason into the change. I have not workedon the code, and, did not plan on reading the code to figure outexactly how it worked and the whys of it, had hoped someone else mighthave an explanation.
Wiping out all the training and stuff for a guess (IF it was a guess,not saying it was as I don't know) shouldn't be taken lightly (whichis the suggestion). That's a lot of time and effort, though, it wasn'tyielding the greatest results anyway.

It's not that much time. It should not take you more than a overnightautomatic training run to produce a very good merged global group.

So, if there is no way to know, and the only solution is to simplyreload DSPAM and try a dozen combinations, that's not a very good useof my time. I'd probably just eliminate DSPAM at that point and useanother product that does not require so much time.
If there is a way to know or make some sense out of it, I'd love tohear it. That's all I am saying. I hope that makes more sense and isnot unreasonable.
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev


_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user



--
Kind Regards from Switzerland,

Stevan Bajić

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev

_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] Increase Spam Hit Rate

Reply via email to