On Fri, Jul 24, 2015 at 10:17:18AM -0700, waterdog wrote:
> Okay, I apologize for all the following questions but, the more I
> troubleshoot dspam without progress, the more questions I have.
> 
> Are there recommendations/documentation on how to properly train?  It seems
> that some users do corpus training and other users just train based on
> actual messages.
> 
> What are the pros/cons of using a corpus vs. actual messages?
> 
> Does it help to retrain multiple times using the same corpus and/or
> messages?
> 
> What are the specific stats that one should look to achieve to determine if
> dspam has had enough training?
> 
> Does TL need to be at zero before dspam will work at all?
> 
> Do you have to train separately for each user or can all users share the
> same training?  
> 
> I've tried training and retraining multiple times using corpuses and actual
> messages but don't seem to be making any real progress.  Here are my current
> stats after training with a corpus:
> 
> sudo dspam_train <username> spam_2 easy_ham_2
> 
> sudo dspam_stats -H <username>
> 
>                 TP True Positives:                     0
>                 TN True Negatives:                  1315
>                 FP False Positives:                 2443
>                 FN False Negatives:                 2154
>                 SC Spam Corpusfed:                     0
>                 NC Nonspam Corpusfed:                  0
>                 TL Training Left:                      0
>                 SHR Spam Hit Rate                  0.00%
>                 HSR Ham Strike Rate:              65.01%
>                 PPV Positive predictive value:     0.00%
>                 OCA Overall Accuracy:             22.24%
> 
> As you can see, the OCA is still low but better than it was before.
> 
> It might help if someone could post working configurations for postfix,
> dspam, dovecot, and clamAV for comparison.  I've tried to follow the online
> documentation but apparently I'm missing something.
> 
Hi,

I am not sure what your training corpus looks like, but those are pretty bad
as results. Training a global/merged group can reduce the accuracy hit at
the beginning, but in general, using a train-on-error setup, with no initial
training would probably be better. Training with some valid good content is
good if your ham/spam ratio is very small. The accuracy is best with an even
mix of spam/ham to start. Then the TOE will keep it balanced.

Regards,
Ken

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to