Raj skrev, on 03-08-2007 06:18:

i had a question concerning dspam training ...

i used shared group -- one single user "common" for the entire server with toe 
mode

Same here with the group on 2 (entirely differently configured Postfix MTA) sites, but on both I use a shared group and teft. One of the sites is my own PC with Postfix/Fetchmail and few Postfix-configurable anti-spam features possible, one is a production site for 1500+ users on which Postfix/policyd is refusing a massive (and increasing every day) amount of stuff before it ever gets to dspam.

i train dspam using aliases -- ie just forward to spam / not-spam aliases

I train dspam by the user dragging incorrectly judged messages (spam or non-spam) to a "misjudged" folder and running a cron job on it every hour. Same at both sites.

i have not done any corpus training till today

The school site has had a massive corpus training, the home site didn't at first, but after a while the results were so unsatisfactory, that I fed it as much spam and non-spam as I could, with dspam_train. This doesn't offer trained spam as corpusfed, though.

i have never purged the dspam database

Purge both sites every week with 'dspam_clean -p', 'cos I don't trust purge-4.1.sql.

i have noticed a few emails (html text) of absolutely the same type come into 
my mailbox undetected as spam. This is a rare incident but happens. ie once in 
around 2-3 days.

Major part of the entire body content of the spam email ie html code behind the 
scene is exactly the same. All that varies is the hyperlink at the bottom which 
points to different websites every time.

you can see them here
http://24x7server.net/spam.html

Unfortunately, the code renders in my Firefox 2.0.0.6 and all I see is the spam message :)

However, I have 2 of these from 22-05 and 26-05 in my own site's spam folder and can look at them there. My policy is to put everything that is spam that gets into my inbox and I have to retrain, into the spam folder after training. Everything - 80-90 per day - that dspam judges correctly I chuck. The fact that I only have two of these in my spam folder would tend to show that dspam has learned very quickly.

i want to know your experience in this matter ...any tips would be helpful

Change toe to teft. Turn on debugging and go through the debug output for stuff that you're interested in and see on which premises spam is being detected. If you don't immediately know what some of the criteria mean, post here. Make sure logrotate is switched on for your debug stuff, with compress on. Purging old stuff does no harm, doesn't affect dspam's accuracy negatively. I don't think that my spams can help you, since, even though using a shared group, the recipient's name is used by dspam to judge, but if you want them, I can offer a tarball on my ftp site.

my dspam stats
common:
TP True Positives: 40383
TN True Negatives: 81087
FP False Positives: 41
FN False Negatives: 813
SC Spam Corpusfed: 759
NC Nonspam Corpusfed: 0
TL Training Left: 0
SHR Spam Hit Rate: 98.03%
HSR Ham Strike Rate:0.05%
OCA Overall Accuracy: 99.30%

That's better than my home site, but not good enough:

                TP True Positives:           3465
                TN True Negatives:          21215
                FP False Positives:             4
                FN False Negatives:           323
                SC Spam Corpusfed:             74
                NC Nonspam Corpusfed:           7
                TL Training Left:               0
                SHR Spam Hit Rate          91.47%
                HSR Ham Strike Rate:        0.02%
                OCA Overall Accuracy:      98.69%

The school's site is:

                TP True Positives:          20963
                TN True Negatives:         111208
                FP False Positives:           508
                FN False Negatives:           408
                SC Spam Corpusfed:           3486
                NC Nonspam Corpusfed:        3002
                TL Training Left:               0
                SHR Spam Hit Rate          98.09%
                HSR Ham Strike Rate:        0.45%
                OCA Overall Accuracy:      99.31%

I'm content with that.

Best,

--Tonni

--
Tony Earnshaw
Email: tonni at hetnet dot nl

Reply via email to