Re: [AMaViS-user] amavisd-new + dspam...

Nathanael Hoyle Fri, 07 Oct 2005 12:11:40 -0700

Gary V wrote:
> Nathanael wrote:
> 
> 
>>Can anyone direct me to reasonably detailed information on how
>>amavisd-new works together with dspam?  Can I take advantage of all of
>>the dspam featureset if using it within amavisd?  I have looked and
>>can't find any real documentation on the way they interoperate and how
>>much of the full power of dspam is available when used with amavisd.
> 
> 
>>Thanks,
> 
> 
> I have not used dspam outside amavisd-new, but I would have to assume
> it is somewhat crippled.
> 
> I have recently set it up on my system, and I am still in the process
> of letting SpamAssassin train dspam (trying to get to 2500 messages).
> 
> I gather that in a typical system, you must manually train dspam for
> it to work with any accuracy. Because dspam has not been trained,
> everything dspam sees initially is considered innocent. Hard coded in
> amavisd-new is a ham level  (< 0.5) and a spam level (> 7.0) that is
> used to retrain dspam. The message is fed to dspam, dspam returns
> 'innocent' or 'spam', then the SpamAssassin score is used to determine if
> dspam needs to be retrained. If it does, the message is fed back
> through dspam to retrain it. This bit of code shows some of this:
> 
> if (defined $dspam && $dspam ne '' && defined $spam_level) {  # auto-learn
>       my($eat,@options);
>       @options = (qw(--stdout --mode=tum --user), $daemon_user);  # 
> --mode=teft
>       if (   $spam_level >  7.0 && $dspam_result eq 'Innocent') {
>         $eat = 'SPAM'; push(@options, qw(--class=spam --source=error));
>       }
>       elsif ($spam_level <  0.5 && $dspam_result eq 'Spam') {
>         $eat = 'HAM'; push(@options, qw(--class=innocent --source=error));
>       }
> 
> I actually changed the numbers to 0.2 and 8.0 on my system.
> 
> Since dspam assumes everything is innocent at first, it is nearly always
> retrained on spam (which I believe is the way you train dspam). Here are
> my current stats:
> 
>                 TS Total Spam:                124
>                 TI Total Innocent:            880
>                 SM Spam Misclassified:        118
>                 IM Innocent Misclassified:      0
>                 SC Spam Corpusfed:              0
>                 IC Innocent Corpusfed:          0
>                 TL Training Left:            1620
>                 SR Spam Catch Rate:        51.24%
>                 IR Innocent Catch Rate:   100.00%
>                 OR Overall Rate/Accuracy:  89.48%
> 
> At the moment, in local.cf, I have placed:
> 
> header DSPAM_SPAM X-DSPAM-Result =~ /^Spam$/
> describe DSPAM_SPAM DSPAM claims it is spam
> score DSPAM_SPAM 0.5
> 
> header DSPAM_HAM X-DSPAM-Result =~ /^Innocent$/
> describe DSPAM_HAM DSPAM claims it is ham
> score DSPAM_HAM -0.1
> 
> So as you can see, the dspam result is used by SpamAssassin and
> initially we are very conservative with the numbers. Once dspam is
> fully trained, dspam's accuracy will improve, and at that point dspam can
> be relied upon to the point that the DSPAM_SPAM (and DSPAM_HAM) can be
> given scores that will make dspam more effective
> (I'm thinking 2.0 - 3.5).
> 
> Here is a sample header of a message that was passed to a user. Note
> that it scored below 8.0, so amavisd-new did not use this message to
> retrain dspam. Note the Bayes score (SpamAssassin could not decide if
> it was spam or ham).
> 
> X-DSPAM-Result: Innocent
> X-DSPAM-Confidence: 0.9997
> X-DSPAM-Probability: 0.0000
> X-DSPAM-Signature: 4343db5590271237217540
> X-DSPAM-Factors: 27,
> X-Virus-Scanned: amavisd-new at example.com
> X-Spam-Status: Yes, score=5.136 required=5 tests=[BAYES_50=0.001,
>  DCC_CHECK=2.169, DIGEST_MULTIPLE=0.098, DSPAM_HAM=-0.1,
>  MARKETING_PARTNERS=1.401, RAZOR2_CF_RANGE_51_100=0.056, RAZOR2_CHECK=1.511]
> X-Spam-Score: 5.136
> X-Spam-Level: *****
> X-Spam-Flag: YES
> 
> A side note:
> I first used BDB as the storage method. Any messages that had over 15K
> of text in the message body caused the child process to timeout. I
> switched to MySQL 4.1, and of course, the problem has solved. It was
> not trivial (for me) to set this up, and for some reason I cannot get
> it to work on one of my machines, but two others work great.
> 
> After installing all the required MySQL libraries, on my Debian
> machine I compiled dspam with:
> 
> ./configure --with-storage-driver=mysql_drv --with-mysql-libraries=/usr/lib
>  --with-mysql-includes=/usr/include/mysql --enable-virtual-users
>  --with-dspam-home=/var/lib/amavis/dspam --enable-signature-headers
>  --without-delivery-agent --without-quarantine-agent --enable-debug
> 
> Paths will vary. I would be interested in how others may have
> compiled it for use with MySQL 4.1, as this was the part I was least
> certain about.
> 
> Gary V
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads, discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> AMaViS-user mailing list
> AMaViS-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/amavis-user
> AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
> AMaViS-HowTos:http://www.amavis.org/howto/


This may be a stupid question, but here goes anyway.  As I understand
it, dspam starts with a blank slate as far as training data, and
therefore also as far as decision-making capability.  One has the option
to corpus feed it, but unless you have a corpus from your own email
traffic, you will be feeding it other people's mail and spam/ham
decisions, which may not closely resemble your own (and therefore
improperly bias dspam).  Assuming no corpus-feeding, if dspam is trained
soley based on SA scoring decisions, then assuming (I know, assumption
is the mother of something or other...) that dspam is capable of clearly
learning via its bayesian token analysis the patterns that SA is
recognizing, it would seem to me that in the end you have trained dspam
(via SA) to make the same decisions that SA would have made.  This means
two things to me:

1) dspam provides no additional benefit when used with SA because it
will have been trained by SA to make the same decision that SA itself
would have made.

2) having dspam decisions adjust SA scores just magnifies the SA score
in either direction, i.e. increasing the risk of FP's and/or raising an
already-spam score or dropping an already-clean score.

Now, it is certainly possible/plausible that dspam will identify
characteristics of a "spam" email (that SA has identified as such) which
would be capable of identifying via these characteristics email which SA
had not similarly identified.  However, if SA is continually retraining
dspam, then logically SA would on these subsequent emails train dspam to
*not* recognize such emails as spam if SA itself did not see them as such.

Either I'm really missing something here, or it sounds to me like this
is a useless addition when set up this way.  The other possibility of
course would be to use SA for initial training of dspam and then
disconnect them and allow per-user training of dspam, but I'd have to
investigate more how well that would work as well as how to put a
standalone dspam into my Postfix->amavisd->clamav->spamassassin->postfix
chain (which uses virtual users, no system accounts).

-- 
Nathanael Hoyle
Systems and Networking
Speed Express Networks, LLC
[EMAIL PROTECTED]
432.837.2811



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Re: [AMaViS-user] amavisd-new + dspam...

Reply via email to