Thanks David for all your help! I really appreciated it. I think I have
the Bayesian filter running fine now, and it does seem to be getting
better at sorting out spam from the good stuff. Especially as I send it
feedback on what it misses as spam and what it mistakenly thinks is
spam. I have set up James to send to me, wearing my "postmaster" hat,
everything that the Bayesian filter thinks is spam. Occasionally I do
find an email that was not really spam and I have to do two things with
it, send it back to James's nospam mailet, and forward it on to the user
who should have gotten it. This latter step is a bit of a pain because
I usually have to clean up the email (or forward it as is)..
Is there an easy way to send the original version of an email to the
user which was accidentally marked as spam? I hope I don't have to keep
double checking all this spam, I get thousands daily... How do you and
others handle the spam that the Bayesian filter catches? Suggestions
welcomed, this could get rather tedious for me...
Marc...
David Legg wrote:
Hi Marc,
My server is already set up to use SMTP AUTH. I have a single domain
name that I have purchased from Network Solutions, lets call it
mydomain.com. I have listed mydomain.com in the <servernames> section
of the James config.xml file. I want this server to service both
internal and external users. So what exactly is this suggestion
asking me to do? Do I need to purchase another domain name in order
to run this Bayesian Analysis mailet? That does not make sense to me...
No, there is no need to purchase another domain. The instructions are
suggesting that you choose email addresses which are impossible for
outsiders to use in order to tell the server what is ham and what is
spam. For example, in my server I literally chose '[EMAIL PROTECTED]' and
not.'[EMAIL PROTECTED]'. Not only are these unlikely to be real domains
but anyone trying to send emails addressed to these addresses will be
challenged for an SMTP password. Thus only authorized users will be
able to train the Bayesian analysis filter.
2. I am not sure I fully understand the concept of having both a spam
and a ham feedback to the Bayesian Analyzer. Spam I can understand,
that is used to teach the analyzer what is spam. But why have a ham
feedback? Do the users have to teach the analyzer what is good email
also??? That seems like an extraordinary burden to place on them.
It is true that the Bayesian filter works best with examples of both
spam and not spam (ham). This is the drawback of this technique. The
other drawback is that the system doesn't discriminate between one
users view of what is spam and another's. The best you can hope for
is a general consensus. To be honest I wouldn't trust the users to
keep the system up to date. I tend to do all the spam control
myself. Don't forget that each individual user can still have their
own anti-spam tools. Your aim is to keep out the bulk of the spam...
you won't be able to completely eradicate it.
... So what did I do wrong? Doesn't seem to have worked too well 'out
of the box'!
The filter works by classifying some email and giving it a score. It
doesn't do anything else to it. You have to set up the pipeline to do
something with messages which score too highly as spam. Initially, I
forwarded all failed emails to the postmaster address so that I could
check them manually and forward them to their owner if they were
mis-classified. These days the filter is so good I simply throw away
anything over a 50% threshold.
Hopefully someone will help walk me out of these woods, I am kinda
lost.. Thanks in advance...
It looks like you have done the worst bit already. However a
difficult bit is deciding on how to process email through your
pipeline. If it helps here is a shortened version of my config.xml
file showing the Bayesian analysis settings I use. Where I have not
shown parts of the file I have marked them with '...' characters.
Notice the commented out section which controls whether messages
considered as spam are simply deleted or sent to the postmaster.
Hope this helps.
Regards,
David Legg
------------------------ config.xml ---------------------------------
...
<config>
...
<spoolmanager>
<threads> 5 </threads>
<!-- ROOT PROCESSOR -->
<processor name="root">
...
<!-- "not spam" bayesian analysis feeder. -->
<mailet match="[EMAIL PROTECTED]"
class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>ham</feedType>
<maxSize>500000</maxSize>
</mailet>
<!-- "spam" bayesian analysis feeder. -->
<mailet match="[EMAIL PROTECTED]"
class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>spam</feedType>
<maxSize>500000</maxSize>
</mailet>
...
<!-- Anti-spam processing -->
<!-- The following two entries avoid double anti-spam analysis
-->
<!-- for forwarded messages. -->
<!-- Has spam checking already been done? -->
<mailet match="HasMailAttribute=spamChecked" class="ToProcessor">
<processor> transport </processor>
</mailet>
<!-- Spam checking will not be done twice -->
<mailet match="All" class="SetMailAttribute">
<spamChecked>true</spamChecked>
</mailet>
<!-- Messages from authenticated senders are never spam -->
<mailet match="SMTPAuthSuccessful" class="ToProcessor">
<processor> transport </processor>
</mailet>
... <!-- Anti spam bayesian analysis -->
<mailet match="All" class="BayesianAnalysis"
onMailetException="ignore">
<repositoryPath>db://maildb</repositoryPath>
<maxSize>3000000</maxSize>
<headerName>X-MessageIsSpamProbability</headerName>
<ignoreLocalSender>false</ignoreLocalSender>
</mailet>
<mailet
match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50"
class="SetMailAttribute" onMatchException="noMatch">
<isSpam>true</isSpam>
</mailet>
<mailet
match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50"
class="SetMimeHeader" onMatchException="noMatch">
<name>X-MessageIsSpam</name>
<value>true</value>
</mailet>
<mailet
match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.50"
class="ToProcessor" onMatchException="noMatch">
<processor> spam </processor>
<notice>Spam not accepted</notice>
</mailet>
<!-- Send remaining mails to the transport processor for
either local or remote delivery -->
<mailet match="All" class="ToProcessor">
<processor> transport </processor>
</mailet>
</processor>
...
<processor name="transport">
<mailet match="SMTPAuthSuccessful" class="SetMimeHeader">
<name>X-UserIsAuth</name>
<value>true</value>
</mailet>
...
</processor>
<processor name="spam">
<mailet match="All" class="Null"/>
<!-- To notify the postmaster that a message was marked as
spam, uncomment this matcher/mailet configuration -->
<!--
<mailet match="All" class="NotifyPostmaster"/>
-->
</processor>
...
</spoolmanager>
...
</config>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]