Re: [Dspam-user] high level of missed ham, but all factors at 0.01000

Sven Karlsson Wed, 26 Aug 2009 12:42:05 -0700

On Wed, Aug 26, 2009 at 1:12 PM, Steve<[email protected]> wrote:

>> New email; one admin goes through a global mailbox and retrains the
>> obvious missed spam and hams. This means that not all FP/FN are
>> retrained, but it should be OK since its TOE training (even though
>> some accuracy is lost). It also means that training may be focused on
>> for example certain days of the week (the admin doing the training is
>> more alert when starting at the monday emails, but may stop training
>> at wednesday emails, leaving thursday-sunday untrained. This may give
>> an unfair balance I assume.
>>
> And how is that global mailbox connected/related to other users? Is that 
> global mailbox the mailbox of your global user? Does retraining there benefit 
> other users or is it just for one user?



Incoming mail goes through postfix to maildrop, and
/etc/courier/maildroprc is parsed for each incoming mail.
Here the mail is sent through dspam in client mode, and tagged
spam/innocent, in TOE mode. I.e. no training here.

Then the mail is sent through spamassassin (ouch, herecy! I hope do be
able to skip it further on :), mainly because of the RBL functions.
The uri-rbl surbl catch a lot of spam that usually only include a link
to some new-registered site. Anyway, spamassassin adds a score if
dspam indicates spam, and if over 5 points, the mail is filtered. An
innocent-dspam tag is given negative score.

A copy of the mail is then delivered to the dspam-admin mailbox, and
the mail is then delivered to the users inbox or the spambox. No
quarantining etc is used. The problem is that only a few users bother
to look in the spambox, most being unaware of it since they mostly POP
the inbox. There is a web-mail interface where I just now added
spam/not spam buttons, to give more control to the user.

So the idea is/was to train the global user for the benefit of all
users, by going through the mailbox and retraining as necessary (via
source=error or source=corpus as described earlier).

For example, if one user got a mail saying "beebopp", I would like
this token to also be recognized for all other users. This may sound
like a merged group, but I would like the admin to decide which mails
should be globally trained and which should be indivdually trained.
Some user may like medical pills, but if they retrain themselves it
should only affect themselves and not all users.

This was the idea behind doing a
cat $mail | /usr/bin/dspam --source=error --class=ham --user globaluser
or
cat $mail | grep -iv x-dspam |grep -iv x-spam | /usr/bin/dspam
--source=corpus --class=ham --user globaluser

for the admin... and then doing cat $mail | /usr/bin/dspam
--source=error --class=ham    for the user-retrained mails.

Or are there other suggestions how such a scenario would/could be set up?

BR
 Sven

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] high level of missed ham, but all factors at 0.01000

Reply via email to