Thanks David for all your help! I really appreciated it. I think I have the Bayesian filter running fine now, and it does seem to be getting better at sorting out spam from the good stuff. Especially as I send it feedback on what it misses as spam and what it mistakenly thinks is spam. I have set up James to send to me, wearing my "postmaster" hat, everything that the Bayesian filter thinks is spam. Occasionally I do find an email that was not really spam and I have to do two things with it, send it back to James's nospam mailet, and forward it on to the user who should have gotten it. This latter step is a bit of a pain because I usually have to clean up the email (or forward it as is)..

Is there an easy way to send the original version of an email to the user which was accidentally marked as spam? I hope I don't have to keep double checking all this spam, I get thousands daily... How do you and others handle the spam that the Bayesian filter catches? Suggestions welcomed, this could get rather tedious for me...

   Marc...

David Legg wrote:
Hi Marc,

My server is already set up to use SMTP AUTH. I have a single domain name that I have purchased from Network Solutions, lets call it mydomain.com. I have listed mydomain.com in the <servernames> section of the James config.xml file. I want this server to service both internal and external users. So what exactly is this suggestion asking me to do? Do I need to purchase another domain name in order to run this Bayesian Analysis mailet? That does not make sense to me...

No, there is no need to purchase another domain. The instructions are suggesting that you choose email addresses which are impossible for outsiders to use in order to tell the server what is ham and what is spam. For example, in my server I literally chose '[EMAIL PROTECTED]' and not.'[EMAIL PROTECTED]'. Not only are these unlikely to be real domains but anyone trying to send emails addressed to these addresses will be challenged for an SMTP password. Thus only authorized users will be able to train the Bayesian analysis filter.

2. I am not sure I fully understand the concept of having both a spam and a ham feedback to the Bayesian Analyzer. Spam I can understand, that is used to teach the analyzer what is spam. But why have a ham feedback? Do the users have to teach the analyzer what is good email also??? That seems like an extraordinary burden to place on them.

It is true that the Bayesian filter works best with examples of both spam and not spam (ham). This is the drawback of this technique. The other drawback is that the system doesn't discriminate between one users view of what is spam and another's. The best you can hope for is a general consensus. To be honest I wouldn't trust the users to keep the system up to date. I tend to do all the spam control myself. Don't forget that each individual user can still have their own anti-spam tools. Your aim is to keep out the bulk of the spam... you won't be able to completely eradicate it.

... So what did I do wrong? Doesn't seem to have worked too well 'out of the box'!

The filter works by classifying some email and giving it a score. It doesn't do anything else to it. You have to set up the pipeline to do something with messages which score too highly as spam. Initially, I forwarded all failed emails to the postmaster address so that I could check them manually and forward them to their owner if they were mis-classified. These days the filter is so good I simply throw away anything over a 50% threshold.

Hopefully someone will help walk me out of these woods, I am kinda lost.. Thanks in advance...

It looks like you have done the worst bit already. However a difficult bit is deciding on how to process email through your pipeline. If it helps here is a shortened version of my config.xml file showing the Bayesian analysis settings I use. Where I have not shown parts of the file I have marked them with '...' characters. Notice the commented out section which controls whether messages considered as spam are simply deleted or sent to the postmaster.

Hope this helps.

Regards,
David Legg

------------------------ config.xml ---------------------------------

...
<config>
...
  <spoolmanager>
     <threads> 5 </threads>

     <!-- ROOT PROCESSOR -->
     <processor name="root">
...
        <!-- "not spam" bayesian analysis feeder. -->
<mailet match="[EMAIL PROTECTED]" class="BayesianAnalysisFeeder">
           <repositoryPath> db://maildb </repositoryPath>
           <feedType>ham</feedType>
           <maxSize>500000</maxSize>
        </mailet>
           <!-- "spam" bayesian analysis feeder. -->
<mailet match="[EMAIL PROTECTED]" class="BayesianAnalysisFeeder">
           <repositoryPath> db://maildb </repositoryPath>
           <feedType>spam</feedType>
           <maxSize>500000</maxSize>
        </mailet>
...
        <!-- Anti-spam processing -->
<!-- The following two entries avoid double anti-spam analysis -->
        <!-- for forwarded messages. -->
        <!-- Has spam checking already been done? -->
        <mailet match="HasMailAttribute=spamChecked" class="ToProcessor">
           <processor> transport </processor>
        </mailet>
        <!-- Spam checking will not be done twice -->
        <mailet match="All" class="SetMailAttribute">
           <spamChecked>true</spamChecked>
        </mailet>

        <!-- Messages from authenticated senders are never spam -->
        <mailet match="SMTPAuthSuccessful" class="ToProcessor">
           <processor> transport </processor>
        </mailet>
...               <!-- Anti spam bayesian analysis -->
<mailet match="All" class="BayesianAnalysis" onMailetException="ignore">
           <repositoryPath>db://maildb</repositoryPath>
           <maxSize>3000000</maxSize>
           <headerName>X-MessageIsSpamProbability</headerName>
           <ignoreLocalSender>false</ignoreLocalSender>
        </mailet>

<mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50" class="SetMailAttribute" onMatchException="noMatch">
           <isSpam>true</isSpam>
        </mailet>

<mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50" class="SetMimeHeader" onMatchException="noMatch">
           <name>X-MessageIsSpam</name>
           <value>true</value>
        </mailet>

<mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50" class="ToProcessor" onMatchException="noMatch">
           <processor> spam </processor>
           <notice>Spam not accepted</notice>
        </mailet>

<!-- Send remaining mails to the transport processor for either local or remote delivery -->
        <mailet match="All" class="ToProcessor">
           <processor> transport </processor>
        </mailet>
     </processor>
...
     <processor name="transport">
        <mailet match="SMTPAuthSuccessful" class="SetMimeHeader">
           <name>X-UserIsAuth</name>
           <value>true</value>
        </mailet>
...
     </processor>

     <processor name="spam">
        <mailet match="All" class="Null"/>
<!-- To notify the postmaster that a message was marked as spam, uncomment this matcher/mailet configuration -->
        <!--
        <mailet match="All" class="NotifyPostmaster"/>
        -->
     </processor>
...
  </spoolmanager>
...
</config>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to