Tony Earnshaw wrote:
Shane Kumpf wrote, on 03. mar 2007 19:06:

One of my pop accounts that I pull down via fetchmail which isn’t controlled by me just changed there spam filtering software. In the process of this change they went from bouncing a lot of suspected spam, to now just quarantining it. Since I don’t use there quarantine, obviously I’m ending up with quite a bit more spam than I used to.

FWIW I'm in more or less the same position as you, doing more or less the same. I'm responsible for a high school email system (1150+ nominal users, around 350 active mailers) running dspam as a daemon with remarkable accuracy. I had my private mail address on the school's server.

For my home machine (Red Hat RHAS4) in November/December last I decided to take my private mail off the school's server and activate my ISP's POP account. The ISP has a spam and virus filtering service but I don't trust it, I trust myself.

I'm running more or less the same basic system that I have at school, but I use Fetchmail 6.3.6. I have a Postfix 2.3.6 MTA calling amavisd-new 2.4.5 with ClamAV 0.90.1 and BitDefender-Console-Antivirus 7.3-1. A Postfix smtpd listener passes the mail to dspam CVS/MySQL 4.1.20 which scans it and passes back to Postfix, which gives it to maildrop for IMAP distribution.

Unfortunately I'm not getting much spam. I do what I can to aggravate things to get more, like posting on newsgroups (which used to work well) with a throwaway address, but that mostly gets me a few "virus" (also phishing stuff caught by ClamAV).

Dspam is having a lot of trouble classifying these new messages as spam. The strange thing is that all this spam looks very similar to the spam it is catching. I’m starting to wonder if I should wipe my databases and start fresh, it’s been about a month and it doesn’t seem to be getting any better. I’m getting roughly 200 pieces of spam a day now. My stats have dropped considerably from about 90% accuracy to less than 70 as you will see. Do you think that if I continue to train it will get better, or do to the size and age of my database that this new spam will have trouble getting classified? Any info I can provide let me know.

I decided to start with a completely empty dspam db and see what happened and I must say I'm pleased with the result up to now, dspam is learning relatively fast and beginning to judge sensibly, even to the extent that it's interpolating correctly (e.g. if it's had spam in Greek or French it recognizes spam in Spanish but leaves the local Dutch stuff alone - I haven't had any Dutch spam to date, though).


                TP True Positives:          17731

                TN True Negatives:          21733

                FP False Positives:         10937

                FN False Negatives:          6361

                SC Spam Corpusfed:           3741

                NC Nonspam Corpusfed:           1

                TL Training Left:               0

                SHR Spam Hit Rate          73.60%

                HSR Ham Strike Rate:       33.48%

                OCA Overall Accuracy:      69.53%

Mine started all askew but as of now it's:


                TP True Positives:             45
                TN True Negatives:           5940
                FP False Positives:             0
                FN False Negatives:            53
                SC Spam Corpusfed:              1
                NC Nonspam Corpusfed:           0
                TL Training Left:               0
                SHR Spam Hit Rate          45.92%
                HSR Ham Strike Rate:        0.00%
                OCA Overall Accuracy:      99.12%

At school it's:
                TP True Positives:          12914
                TN True Negatives:          87136
                FP False Positives:           384
                FN False Negatives:           344
                SC Spam Corpusfed:           3311
                NC Nonspam Corpusfed:        3002
                TL Training Left:               0
                SHR Spam Hit Rate          97.41%
                HSR Ham Strike Rate:        0.44%
                OCA Overall Accuracy:      99.28%

So not a wild difference between corpus feeding or not. The school gets most correspondence in Dutch and to begin with (starting October last) dspam thought all Dutch stuff was spam (all the corpus was English) and got mixed up, but it's mostly judging well now.

I'm using a shared group for both sites and my home dspam.conf looks like:

Home /var/dspam
DeliveryHost        127.0.0.1
DeliveryPort        10026
DeliveryIdent       dspam-out
DeliveryProto       SMTP
FallbackDomains on
OnFail error
Trust root
Trust nobody
Debug *
DebugOpt process spam fp innocent
TrainingMode toe
TestConditionalTraining on
Feature tb=3
Feature whitelist
Feature noise
Algorithm graham burton
PValue graham
SupressWebStats on
ImprobabilityDrive on
Preference "signatureLocation=headers"  # 'message' or 'headers'
AllowOverride trainingMode
AllowOverride spamAction spamSubject
AllowOverride statisticalSedation
AllowOverride enableBNR
AllowOverride enableWhitelist
AllowOverride showFactors
AllowOverride optIn optOut
AllowOverride whitelistThreshold
AllowOverride makeCorpus
AllowOverride fallbackDomain
AllowOverride trainingMode
MySQLServer     /var/lib/mysql/mysql.sock
MySQLUser               dspam
MySQLPass               dspam
MySQLDb                 dspamdb
MySQLConnectionCache    10
IgnoreHeader DomainKey-Signature
IgnoreHeader X-DKIM
IgnoreHeader X-Virus-Scanned
IgnoreHeader Delivered-To
IgnoreHeader In-Reply-To
IgnoreHeader X-OriginalArrivalTime
IgnoreHeader X-Disclaimer
IgnoreHeader X-Mailman-Approved-At
IgnoreHeader Archive
IgnoreHeader List-Post
IgnoreHeader List-Subscribe
IgnoreHeader List-Unsubscribe
IgnoreHeader List-Help
IgnoreHeader List-Id
IgnoreHeader Message-ID
Notifications   on
PurgeSignatures 21          # Stale signatures
PurgeNeutral    90          # Tokens with neutralish probabilities
PurgeUnused     90          # Unused tokens
PurgeHapaxes    30          # Tokens with less than 5 hits (hapaxes)
PurgeHits1S     15          # Tokens with only 1 spam hit
PurgeHits1I     15          # Tokens with only 1 innocent hit
LocalMX 127.0.0.1 192.168.0.3 213.75.3.22 213.10.163.78
SystemLog on
UserLog   on
Opt out
TrackSources spam
Broken lineStripping
MaxMessageSize 1024000
ServerHost              127.0.0.1
ServerPort              24
ServerQueueSize 32
ServerPID               /var/run/dspam.pid
ServerMode standard
ServerParameters       "--deliver=innocent,spam -d %u"
ServerIdent            "dspam-in"
ProcessorBias on

Best,

--Tonni

(  Holy cow!  I have never had accuracy that high. :(  )

Reply via email to