Re: Using the Bayesian Analysis mailet

Marc Chamberlin Fri, 26 Sep 2008 17:35:12 -0700

Thanks David for all your help! I really appreciated it. I think I havethe Bayesian filter running fine now, and it does seem to be gettingbetter at sorting out spam from the good stuff. Especially as I send itfeedback on what it misses as spam and what it mistakenly thinks isspam. I have set up James to send to me, wearing my "postmaster" hat,everything that the Bayesian filter thinks is spam. Occasionally I dofind an email that was not really spam and I have to do two things withit, send it back to James's nospam mailet, and forward it on to the userwho should have gotten it. This latter step is a bit of a pain becauseI usually have to clean up the email (or forward it as is)..

Is there an easy way to send the original version of an email to theuser which was accidentally marked as spam? I hope I don't have to keepdouble checking all this spam, I get thousands daily... How do you andothers handle the spam that the Bayesian filter catches? Suggestionswelcomed, this could get rather tedious for me...


   Marc...

David Legg wrote:

Hi Marc,
My server is already set up to use SMTP AUTH. I have a single domainname that I have purchased from Network Solutions, lets call itmydomain.com. I have listed mydomain.com in the <servernames> sectionof the James config.xml file. I want this server to service bothinternal and external users. So what exactly is this suggestionasking me to do? Do I need to purchase another domain name in orderto run this Bayesian Analysis mailet? That does not make sense to me...
No, there is no need to purchase another domain. The instructions aresuggesting that you choose email addresses which are impossible foroutsiders to use in order to tell the server what is ham and what isspam. For example, in my server I literally chose '[EMAIL PROTECTED]' andnot.'[EMAIL PROTECTED]'. Not only are these unlikely to be real domainsbut anyone trying to send emails addressed to these addresses will bechallenged for an SMTP password. Thus only authorized users will beable to train the Bayesian analysis filter.
2. I am not sure I fully understand the concept of having both a spamand a ham feedback to the Bayesian Analyzer. Spam I can understand,that is used to teach the analyzer what is spam. But why have a hamfeedback? Do the users have to teach the analyzer what is good emailalso??? That seems like an extraordinary burden to place on them.
It is true that the Bayesian filter works best with examples of bothspam and not spam (ham). This is the drawback of this technique. Theother drawback is that the system doesn't discriminate between oneusers view of what is spam and another's. The best you can hope foris a general consensus. To be honest I wouldn't trust the users tokeep the system up to date. I tend to do all the spam controlmyself. Don't forget that each individual user can still have theirown anti-spam tools. Your aim is to keep out the bulk of the spam...you won't be able to completely eradicate it.
... So what did I do wrong? Doesn't seem to have worked too well 'outof the box'!
The filter works by classifying some email and giving it a score. Itdoesn't do anything else to it. You have to set up the pipeline to dosomething with messages which score too highly as spam. Initially, Iforwarded all failed emails to the postmaster address so that I couldcheck them manually and forward them to their owner if they weremis-classified. These days the filter is so good I simply throw awayanything over a 50% threshold.
Hopefully someone will help walk me out of these woods, I am kindalost.. Thanks in advance...
It looks like you have done the worst bit already. However adifficult bit is deciding on how to process email through yourpipeline. If it helps here is a shortened version of my config.xmlfile showing the Bayesian analysis settings I use. Where I have notshown parts of the file I have marked them with '...' characters.Notice the commented out section which controls whether messagesconsidered as spam are simply deleted or sent to the postmaster.
Hope this helps.

Regards,
David Legg

------------------------ config.xml ---------------------------------

...
<config>
...
  <spoolmanager>
     <threads> 5 </threads>

     
     <processor name="root">
...
        
<mailet match="[EMAIL PROTECTED]"class="BayesianAnalysisFeeder">
           <repositoryPath> db://maildb </repositoryPath>
           <feedType>ham</feedType>
           <maxSize>500000</maxSize>
        </mailet>
           
<mailet match="[EMAIL PROTECTED]"class="BayesianAnalysisFeeder">
           <repositoryPath> db://maildb </repositoryPath>
           <feedType>spam</feedType>
           <maxSize>500000</maxSize>
        </mailet>
...
        

        
        
        <mailet match="HasMailAttribute=spamChecked" class="ToProcessor">
           <processor> transport </processor>
        </mailet>
        
        <mailet match="All" class="SetMailAttribute">
           <spamChecked>true</spamChecked>
        </mailet>

        
        <mailet match="SMTPAuthSuccessful" class="ToProcessor">
           <processor> transport </processor>
        </mailet>
...               
<mailet match="All" class="BayesianAnalysis"onMailetException="ignore">
           <repositoryPath>db://maildb</repositoryPath>
           <maxSize>3000000</maxSize>
           <headerName>X-MessageIsSpamProbability</headerName>
           <ignoreLocalSender>false</ignoreLocalSender>
        </mailet>
<mailetmatch="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50"class="SetMailAttribute" onMatchException="noMatch">
           <isSpam>true</isSpam>
        </mailet>
<mailetmatch="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50"class="SetMimeHeader" onMatchException="noMatch">
           <name>X-MessageIsSpam</name>
           <value>true</value>
        </mailet>
<mailetmatch="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50"class="ToProcessor" onMatchException="noMatch">
           <processor> spam </processor>
           <notice>Spam not accepted</notice>
        </mailet>

        <mailet match="All" class="ToProcessor">
           <processor> transport </processor>
        </mailet>
     </processor>
...
     <processor name="transport">
        <mailet match="SMTPAuthSuccessful" class="SetMimeHeader">
           <name>X-UserIsAuth</name>
           <value>true</value>
        </mailet>
...
     </processor>

     <processor name="spam">
        <mailet match="All" class="Null"/>

        
     </processor>
...
  </spoolmanager>
...
</config>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using the Bayesian Analysis mailet

Reply via email to