Shane Kumpf wrote, on 03. mar 2007 19:06:
One of my pop accounts that I pull down via fetchmail which isn’t
controlled by me just changed there spam filtering software. In the
process of this change they went from bouncing a lot of suspected spam,
to now just quarantining it. Since I don’t use there quarantine,
obviously I’m ending up with quite a bit more spam than I used to.
FWIW I'm in more or less the same position as you, doing more or less
the same. I'm responsible for a high school email system (1150+ nominal
users, around 350 active mailers) running dspam as a daemon with
remarkable accuracy. I had my private mail address on the school's server.
For my home machine (Red Hat RHAS4) in November/December last I decided
to take my private mail off the school's server and activate my ISP's
POP account. The ISP has a spam and virus filtering service but I don't
trust it, I trust myself.
I'm running more or less the same basic system that I have at school,
but I use Fetchmail 6.3.6. I have a Postfix 2.3.6 MTA calling
amavisd-new 2.4.5 with ClamAV 0.90.1 and BitDefender-Console-Antivirus
7.3-1. A Postfix smtpd listener passes the mail to dspam CVS/MySQL
4.1.20 which scans it and passes back to Postfix, which gives it to
maildrop for IMAP distribution.
Unfortunately I'm not getting much spam. I do what I can to aggravate
things to get more, like posting on newsgroups (which used to work well)
with a throwaway address, but that mostly gets me a few "virus" (also
phishing stuff caught by ClamAV).
Dspam is having a lot of trouble classifying these new messages as spam.
The strange thing is that all this spam looks very similar to the spam
it is catching. I’m starting to wonder if I should wipe my databases
and start fresh, it’s been about a month and it doesn’t seem to be
getting any better. I’m getting roughly 200 pieces of spam a day now.
My stats have dropped considerably from about 90% accuracy to less than
70 as you will see. Do you think that if I continue to train it will get
better, or do to the size and age of my database that this new spam will
have trouble getting classified? Any info I can provide let me know.
I decided to start with a completely empty dspam db and see what
happened and I must say I'm pleased with the result up to now, dspam is
learning relatively fast and beginning to judge sensibly, even to the
extent that it's interpolating correctly (e.g. if it's had spam in Greek
or French it recognizes spam in Spanish but leaves the local Dutch stuff
alone - I haven't had any Dutch spam to date, though).
TP True Positives: 17731
TN True Negatives: 21733
FP False Positives: 10937
FN False Negatives: 6361
SC Spam Corpusfed: 3741
NC Nonspam Corpusfed: 1
TL Training Left: 0
SHR Spam Hit Rate 73.60%
HSR Ham Strike Rate: 33.48%
OCA Overall Accuracy: 69.53%
Mine started all askew but as of now it's:
TP True Positives: 45
TN True Negatives: 5940
FP False Positives: 0
FN False Negatives: 53
SC Spam Corpusfed: 1
NC Nonspam Corpusfed: 0
TL Training Left: 0
SHR Spam Hit Rate 45.92%
HSR Ham Strike Rate: 0.00%
OCA Overall Accuracy: 99.12%
At school it's:
TP True Positives: 12914
TN True Negatives: 87136
FP False Positives: 384
FN False Negatives: 344
SC Spam Corpusfed: 3311
NC Nonspam Corpusfed: 3002
TL Training Left: 0
SHR Spam Hit Rate 97.41%
HSR Ham Strike Rate: 0.44%
OCA Overall Accuracy: 99.28%
So not a wild difference between corpus feeding or not. The school gets
most correspondence in Dutch and to begin with (starting October last)
dspam thought all Dutch stuff was spam (all the corpus was English) and
got mixed up, but it's mostly judging well now.
I'm using a shared group for both sites and my home dspam.conf looks like:
Home /var/dspam
DeliveryHost 127.0.0.1
DeliveryPort 10026
DeliveryIdent dspam-out
DeliveryProto SMTP
FallbackDomains on
OnFail error
Trust root
Trust nobody
Debug *
DebugOpt process spam fp innocent
TrainingMode toe
TestConditionalTraining on
Feature tb=3
Feature whitelist
Feature noise
Algorithm graham burton
PValue graham
SupressWebStats on
ImprobabilityDrive on
Preference "signatureLocation=headers" # 'message' or 'headers'
AllowOverride trainingMode
AllowOverride spamAction spamSubject
AllowOverride statisticalSedation
AllowOverride enableBNR
AllowOverride enableWhitelist
AllowOverride showFactors
AllowOverride optIn optOut
AllowOverride whitelistThreshold
AllowOverride makeCorpus
AllowOverride fallbackDomain
AllowOverride trainingMode
MySQLServer /var/lib/mysql/mysql.sock
MySQLUser dspam
MySQLPass dspam
MySQLDb dspamdb
MySQLConnectionCache 10
IgnoreHeader DomainKey-Signature
IgnoreHeader X-DKIM
IgnoreHeader X-Virus-Scanned
IgnoreHeader Delivered-To
IgnoreHeader In-Reply-To
IgnoreHeader X-OriginalArrivalTime
IgnoreHeader X-Disclaimer
IgnoreHeader X-Mailman-Approved-At
IgnoreHeader Archive
IgnoreHeader List-Post
IgnoreHeader List-Subscribe
IgnoreHeader List-Unsubscribe
IgnoreHeader List-Help
IgnoreHeader List-Id
IgnoreHeader Message-ID
Notifications on
PurgeSignatures 21 # Stale signatures
PurgeNeutral 90 # Tokens with neutralish probabilities
PurgeUnused 90 # Unused tokens
PurgeHapaxes 30 # Tokens with less than 5 hits (hapaxes)
PurgeHits1S 15 # Tokens with only 1 spam hit
PurgeHits1I 15 # Tokens with only 1 innocent hit
LocalMX 127.0.0.1 192.168.0.3 213.75.3.22 213.10.163.78
SystemLog on
UserLog on
Opt out
TrackSources spam
Broken lineStripping
MaxMessageSize 1024000
ServerHost 127.0.0.1
ServerPort 24
ServerQueueSize 32
ServerPID /var/run/dspam.pid
ServerMode standard
ServerParameters "--deliver=innocent,spam -d %u"
ServerIdent "dspam-in"
ProcessorBias on
Best,
--Tonni
--
Tony Earnshaw
Email: tonni at hetnet dot nl