Hello:

I'm in the process of converting my small freebsd server from sendmail/crm114/mbox to exim/dspam/maildir. Anyway, this is proving to be a bigger job than I originally planned (aren't they all). I apologize for the long email, but I need to explain the situation.

I first installed all the necessary programs on my desktop pc (freebsd/i386) so that I could get my test configuration working before I took everything "live".

I installed dspam from freebsd's ports. I'm using a mysql 5.0 backend, teft, not in daemon mode with exim as the LDA and procmail putting my spam into a mailbox based on headers. I decided to train dspam with my last 1000 hams and 1000 spam messages, so I filtered out my CRM114 headers with grep, converted each mbox to maildir and fed the resulting directories to dspam_train. I'm still not sure if want "pretraining" or not, but it at least confirmed dspam was working. The results were as follows:

                TP True Positives:            977
                TN True Negatives:            998
                FP False Positives:             2
                FN False Negatives:            23
                SC Spam Corpusfed:              0
                NC Nonspam Corpusfed:           0
                TL Training Left:            1500
                SHR Spam Hit Rate          97.70%
                HSR Ham Strike Rate:        0.20%
                OCA Overall Accuracy:      98.75%


Not bad I thought, so then, I felt happy with everything and I installed everything the same way (from the ports tree with the same options) on my server which incidentally is freebsd/sparc64. The results of my identical training are as follows:

                TP True Positives:            913
                TN True Negatives:           1000
                FP False Positives:             0
                FN False Negatives:            87
                SC Spam Corpusfed:              1
                NC Nonspam Corpusfed:           0
                TL Training Left:            1500
                SHR Spam Hit Rate          91.30%
                HSR Ham Strike Rate:        0.00%
                OCA Overall Accuracy:      95.65%

Why is dspam so much better on my athlon than it is on my ultrasparc. The versions of freebsd are identical, with the same version of dspam, the same build variables, the same training corpus (and the logs indicate the messages were processed in the same order. I've run the training several times now (starting with an empty mysql db and the same X-CRM114 header stripped mbox files) and the results from each machine are reproducible.

Any ideas? I don't want to put the less accurate dspam into production (especially if I've found a bug). I can send the log files if that would help, just let me know what other info is relevant.

Thanks,
-Peter


Reply via email to