Hallo Carlo,

> So during training, I would just ignore the message, write "Corrupt 
> message" or something, and move on to the next one.
> > 
I do that already with my enhanced training script. Just as an example, the 
first 200 messages from TREC05:
theia full # head -n 200 index > 200-index
theia full # ../../../../dspam_train_tone_v4 mergedglobal -i 200-index
Taking Snapshot...
mergedglobal
    TP:     0 TN:     0 FP:     1 FN:     0 SC: 93406 NC: 52955
=================================================================
Training corpora:
  Using index file:    200-index
Parameters:
  Show subject:        No
  Refute:              No
  Spam TONE Threshold: 0
  Ham TONE Threshold:  0
  Maximum retrain:     1
=================================================================

Training on 200-index index...
[test: nonspam] ../data/000/000                  result: PASS [0.99]
[test: nonspam] ../data/000/001                  result: PASS [0.75]
[test: nonspam] ../data/000/002                  result: PASS [0.84]
[test: nonspam] ../data/000/003                  result: PASS [0.85]
[test: nonspam] ../data/000/004                  result: PASS [0.85]
[test: nonspam] ../data/000/005                  result: PASS [0.99]
[test: nonspam] ../data/000/006                  result: PASS [0.76]
[test: nonspam] ../data/000/007                  result: PASS [0.76]
[test: nonspam] ../data/000/008                  result: PASS [0.59]
[test: nonspam] ../data/000/009                  result: PASS [0.85]
[test: nonspam] ../data/000/010                  result: PASS [0.56]
[test: nonspam] ../data/000/011                  result: PASS [0.69]
[test: nonspam] ../data/000/012                  result: PASS [0.93]
[test: nonspam] ../data/000/013                  result: PASS [0.75]
[test: nonspam] ../data/000/014                  result: PASS [0.56]
[test: nonspam] ../data/000/015                  result: PASS [0.99]
[test: nonspam] ../data/000/016                  result: PASS [0.76]
[test: nonspam] ../data/000/017                  result: PASS [0.91]
[test: nonspam] ../data/000/018                  result: PASS [0.99]
[test: nonspam] ../data/000/019                  result: PASS [0.85]
[test: nonspam] ../data/000/020                  result: PASS [0.68]
[test: nonspam] ../data/000/021                  result: PASS [0.70]
[test: nonspam] ../data/000/022                  result: PASS [0.83]
[test: nonspam] ../data/000/023                  result: PASS [0.85]
[test: nonspam] ../data/000/024                  result: PASS [0.56]
[test: nonspam] ../data/000/025                  result: PASS [0.65]
[test: nonspam] ../data/000/026                  result: PASS [0.92]
[test: nonspam] ../data/000/027                  result: PASS [1.00]
[test: nonspam] ../data/000/028                  result: PASS [0.80]
[test: nonspam] ../data/000/029                  result: PASS [0.65]
[test: nonspam] ../data/000/030                  result: PASS [0.65]
[test: nonspam] ../data/000/031                  result: PASS [0.62]
[test: nonspam] ../data/000/032                  result: PASS [0.69]
[test: nonspam] ../data/000/033                  result: PASS [0.99]
[test: nonspam] ../data/000/034                  result: PASS [0.75]
[test: nonspam] ../data/000/035                  result: PASS [0.96]
[test: nonspam] ../data/000/036                  result: PASS [0.69]
[test: nonspam] ../data/000/037                  result: PASS [0.60]
[test: nonspam] ../data/000/038                  result: PASS [0.99]
[test: nonspam] ../data/000/039                  result: PASS [0.76]
[test: nonspam] ../data/000/040                  result: PASS [0.70]
[test: nonspam] ../data/000/041                  result: PASS [0.71]
[test: nonspam] ../data/000/042                  result: PASS [0.76]
[test: nonspam] ../data/000/043                  result: PASS [0.65]
[test: nonspam] ../data/000/044                  result: PASS [0.86]
[test: nonspam] ../data/000/045                  result: PASS [0.85]
[test: nonspam] ../data/000/046                  result: PASS [0.85]
[test: nonspam] ../data/000/047                  result: PASS [0.72]
[test: nonspam] ../data/000/048                  result: PASS [0.85]
[test: nonspam] ../data/000/049                  result: PASS [1.00]
[test: nonspam] ../data/000/050                  result: PASS [0.99]
[test: nonspam] ../data/000/051                  result: PASS [0.99]
[test: nonspam] ../data/000/052                  result: PASS [0.99]
[test: nonspam] ../data/000/053                  result: PASS [0.57]
[test: nonspam] ../data/000/054                  result: PASS [0.75]
[test: nonspam] ../data/000/055                  result: PASS [0.85]
[test: nonspam] ../data/000/056                  result: PASS [0.85]
[test: nonspam] ../data/000/057                  result: PASS [0.67]
[test: nonspam] ../data/000/058                  result: PASS [0.70]
[test: nonspam] ../data/000/059                  result: PASS [0.99]
[test: nonspam] ../data/000/060                  result: PASS [0.85]
[test: nonspam] ../data/000/061                  result: PASS [0.75]
[test: nonspam] ../data/000/062                  result: PASS [0.99]
[test: nonspam] ../data/000/063                  result: PASS [0.70]
[test: nonspam] ../data/000/064                  result: PASS [0.85]
[test: nonspam] ../data/000/065                  result: PASS [0.85]
[test: nonspam] ../data/000/066                  result: PASS [0.99]
[test: nonspam] ../data/000/067                  result: PASS [0.76]
[test: nonspam] ../data/000/068                  result: PASS [0.76]
[test: nonspam] ../data/000/069                  result: PASS [0.85]
[test: nonspam] ../data/000/070                  result: PASS [0.75]
[test: nonspam] ../data/000/071                  result: PASS [0.75]
[test: nonspam] ../data/000/072                  result: PASS [0.83]
[test: nonspam] ../data/000/073                  result: PASS [0.61]
[test: nonspam] ../data/000/074                  result: PASS [0.53]
[test: nonspam] ../data/000/075                  result: PASS [0.99]
[test: nonspam] ../data/000/076                  result: PASS [0.61]
[test: spam   ] ../data/000/077                  result: PASS [0.99]
[test: spam   ] ../data/000/078                  result: PASS [0.99]
[test: spam   ] ../data/000/079                  result: PASS [0.99]
[test: spam   ] ../data/000/080                  result: PASS [0.99]
[test: spam   ] ../data/000/081                  result: PASS [0.99]
[test: spam   ] ../data/000/082                  result: PASS [0.99]
[test: spam   ] ../data/000/083                  result: PASS [0.99]
[test: spam   ] ../data/000/084                  result: PASS [0.99]
[test: spam   ] ../data/000/085                  result: PASS [0.99]
[test: spam   ] ../data/000/086                  result: PASS [0.99]
[test: spam   ] ../data/000/087                  result: PASS [0.99]
[test: spam   ] ../data/000/088                  result: PASS [0.99]
[test: spam   ] ../data/000/089                  result: PASS [0.99]
[test: spam   ] ../data/000/090                  result: PASS [0.99]
[test: spam   ] ../data/000/091                  result: PASS [0.99]
[test: spam   ] ../data/000/092                  result: PASS [0.99]
[test: spam   ] ../data/000/093                  result: PASS [0.99]
[test: spam   ] ../data/000/094                  result: PASS [0.99]
[test: spam   ] ../data/000/095                  result: PASS [0.99]
[test: spam   ] ../data/000/096                  result: PASS [0.99]
[test: spam   ] ../data/000/097                  result: PASS [0.99]
[test: spam   ] ../data/000/098                  result: PASS [0.99]
[test: spam   ] ../data/000/099                  result: PASS [0.99]
[test: spam   ] ../data/000/100                  result: PASS [0.99]
[test: spam   ] ../data/000/101                  result: PASS [0.99]
[test: spam   ] ../data/000/102                  result: PASS [0.99]
[test: spam   ] ../data/000/103                  result: PASS [0.99]
[test: spam   ] ../data/000/104                  result: PASS [0.99]
[test: spam   ] ../data/000/105                  result: PASS [0.99]
[test: spam   ] ../data/000/106                  result: PASS [0.99]
[test: spam   ] ../data/000/107                  result: PASS [0.99]
[test: spam   ] ../data/000/108                  result: PASS [0.99]
[test: spam   ] ../data/000/109                  result: PASS [0.99]
[test: spam   ] ../data/000/110                  result: PASS [0.99]
[test: spam   ] ../data/000/111                  result: PASS [0.99]
[test: spam   ] ../data/000/112                  result: PASS [0.99]
[test: spam   ] ../data/000/113                  result: PASS [0.99]
[test: spam   ] ../data/000/114                  result: PASS [0.99]
[test: spam   ] ../data/000/115                  result: PASS [0.99]
[test: spam   ] ../data/000/116                  result: PASS [0.99]
[test: spam   ] ../data/000/117                  result: PASS [0.84]
[test: spam   ] ../data/000/118                  result: PASS [0.99]
[test: spam   ] ../data/000/119                  result: PASS [0.99]
[test: spam   ] ../data/000/120                  result: PASS [0.99]
[test: spam   ] ../data/000/121                  result: PASS [0.99]
[test: nonspam] ../data/000/122                  result: PASS [0.76]
[test: nonspam] ../data/000/123                  result: PASS [0.62]
[test: spam   ] ../data/000/124                  result: PASS [0.84]
[test: spam   ] ../data/000/125                  result: PASS [0.99]
[test: spam   ] ../data/000/126                  result: PASS [0.99]
[test: spam   ] ../data/000/127                  result: PASS [0.99]
[test: spam   ] ../data/000/128                  result: PASS [0.99]
[test: spam   ] ../data/000/129                  result: PASS [0.99]
[test: spam   ] ../data/000/130                  result: PASS [0.99]
[test: spam   ] ../data/000/131                  result: PASS [0.99]
[test: spam   ] ../data/000/132                  result: PASS [0.99]
[test: spam   ] ../data/000/133                  result: PASS [0.99]
[test: spam   ] ../data/000/134                  result: PASS [0.99]
[test: spam   ] ../data/000/135                  result: PASS [0.99]
[test: spam   ] ../data/000/136                  result: PASS [0.99]
[test: spam   ] ../data/000/137                  result: PASS [0.99]
[test: spam   ] ../data/000/138                  result: PASS [0.89]
[test: spam   ] ../data/000/139                  result: PASS [0.89]
[test: spam   ] ../data/000/140                  result: PASS [0.89]
[test: spam   ] ../data/000/141                  result: PASS [0.89]
[test: spam   ] ../data/000/142                  result: PASS [0.89]
[test: spam   ] ../data/000/143                  result: PASS [0.89]
[test: spam   ] ../data/000/144                  result: PASS [0.99]
[test: spam   ] ../data/000/145                  result: PASS [0.99]
[test: spam   ] ../data/000/146                  result: PASS [0.99]
[test: spam   ] ../data/000/147                  result: PASS [0.99]
[test: spam   ] ../data/000/148                  result: PASS [0.99]
[test: spam   ] ../data/000/149                  result: PASS [0.99]
[test: spam   ] ../data/000/150                  result: PASS [0.99]
[test: spam   ] ../data/000/151                  result: PASS [0.99]
[test: spam   ] ../data/000/152                  result: PASS [0.99]
[test: spam   ] ../data/000/153                  result: PASS [0.99]
[test: spam   ] ../data/000/154                  result: PASS [0.99]
[test: spam   ] ../data/000/155                  result: PASS [0.99]
[test: spam   ] ../data/000/156                  result: PASS [0.99]
[test: spam   ] ../data/000/157                  result: PASS [0.89]
[test: spam   ] ../data/000/158                  result: PASS [0.99]
[test: spam   ] ../data/000/159                  result: PASS [0.99]
[test: spam   ] ../data/000/160                  result: PASS [0.99]
[test: spam   ] ../data/000/161                  result: PASS [0.78]
[test: spam   ] ../data/000/162                  result: PASS [0.83]
[test: spam   ] ../data/000/163                  result: PASS [0.99]
[test: spam   ] ../data/000/164                  result: PASS [0.99]
[test: spam   ] ../data/000/165                  result: PASS [0.99]
[test: spam   ] ../data/000/166                  result: PASS [0.99]
[test: spam   ] ../data/000/167                  result: PASS [0.99]
[test: spam   ] ../data/000/168                  result: PASS [0.99]
[test: spam   ] ../data/000/169                  result: PASS [0.86]
[test: spam   ] ../data/000/170                  result: PASS [0.78]
[test: spam   ] ../data/000/171                  result: PASS [0.79]
[test: spam   ] ../data/000/172                  result: PASS [0.99]
[test: spam   ] ../data/000/173                  result: PASS [0.99]
[test: spam   ] ../data/000/174                  result: PASS [0.99]
[test: spam   ] ../data/000/175                  result: PASS [0.99]
[test: spam   ] ../data/000/176                  result: PASS [0.99]
[test: spam   ] ../data/000/177                  result: PASS [0.99]
[test: spam   ] ../data/000/178                  result: PASS [0.99]
[test: spam   ] ../data/000/179                  result: PASS [0.99]
[test: spam   ] ../data/000/180                  result: PASS [0.99]
[test: spam   ] ../data/000/181                  result: PASS [0.99]
[test: spam   ] ../data/000/182                  result: PASS [0.99]
[test: spam   ] ../data/000/183                  result: PASS [0.99]
[test: spam   ] ../data/000/184                  result: PASS [0.91]
[test: spam   ] ../data/000/185                  result: PASS [0.99]
[test: spam   ] ../data/000/186                  result: PASS [0.99]
[test: spam   ] ../data/000/187                  result: PASS [0.99]
[test: spam   ] ../data/000/188                  result: PASS [0.99]
[test: spam   ] ../data/000/189                  result: PASS [0.99]
[test: spam   ] ../data/000/190                  result: PASS [0.99]
[test: spam   ] ../data/000/191                  result: PASS [0.99]
[test: spam   ] ../data/000/192                  result: PASS [0.99]
[test: spam   ] ../data/000/193                  result: PASS [0.99]
[test: spam   ] ../data/000/194                  result: PASS [0.99]
[test: spam   ] ../data/000/195                  result: PASS [0.99]
[test: spam   ] ../data/000/196                  result: PASS [0.99]
[test: spam   ] ../data/000/197                  result: PASS [0.99]
[test: spam   ] ../data/000/198                  result: PASS [0.99]
[test: spam   ] ../data/000/199                  result: PASS [0.99]
TRAINING COMPLETE

=================================================================
 Processed: 200 | TP: 121 | TN: 79 | FP: 0 | FN: 0
=================================================================

Training Snapshot:
mergedglobal
    TP:     0 TN:     0 FP:     0 FN:     0 SC:     0 NC:     0
    SHR:  100.00%       HSR:    0.00%       OCA:  100.00%

Overall Statistics:
mergedglobal
    TP:     0 TN:     0 FP:     1 FN:     0 SC: 93406 NC: 52955
    SHR:  100.00%       HSR:  100.00%       OCA:    0.00%
theia full # ../../../../dspam_train_tone_v4
ERROR: spam corpus must be path to maildir directory or MBOX file.

Usage: ../../../../dspam_train_tone_v4
  [username]                     DSPAM user name
  [--client]                     To run in client mode
  [--refute]                     To unlearn errors from opposite class
  [--subject]                    To show subject from error/unlearn/TONE
  [--max-retrain max_retrain]    Maximum relearns per error/TONE
  [--spam-threshold threshold]   TONE Spam threshold
  [--ham-threshold threshold]    TONE Ham threshold
  [[-i index]|[spam_dir] [nonspam_dir]]

theia full #


I train totally different them most people do. I don't train on signatures and 
I use a double sided asymetric threshold as a reinforcement zone for TONE 
(Train On error or Near Error) training. The above example does not use it but 
if I would re-run it with an (insane value to show how it is working) threshold 
then the output looks like this:
theia full # ../../../../dspam_train_tone_v4 mergedglobal -s -r -m 3 -st 80 -ht 
40 -i 200-index
Taking Snapshot...
mergedglobal
    TP:     0 TN:     0 FP:     1 FN:     0 SC: 93406 NC: 52955
=================================================================
Training corpora:
  Using index file:    200-index
Parameters:
  Show subject:        Yes
  Refute:              Yes
  Spam TONE Threshold: 0.8
  Ham TONE Threshold:  0.4
  Maximum retrain:     3
=================================================================

Training on 200-index index...
[test: nonspam] ../data/000/000                  result: PASS [0.99]
[test: nonspam] ../data/000/001                  result: PASS [0.75]
[test: nonspam] ../data/000/002                  result: PASS [0.84]
[test: nonspam] ../data/000/003                  result: PASS [0.85]
[test: nonspam] ../data/000/004                  result: PASS [0.85]
[test: nonspam] ../data/000/005                  result: PASS [0.99]
[test: nonspam] ../data/000/006                  result: PASS [0.76]
[test: nonspam] ../data/000/007                  result: PASS [0.76]
[test: nonspam] ../data/000/008                  result: PASS [0.59]
[test: nonspam] ../data/000/009                  result: PASS [0.85]
[test: nonspam] ../data/000/010                  result: PASS [0.56]
[test: nonspam] ../data/000/011                  result: PASS [0.69]
[test: nonspam] ../data/000/012                  result: PASS [0.93]
[test: nonspam] ../data/000/013                  result: PASS [0.75]
[test: nonspam] ../data/000/014                  result: PASS [0.56]
[test: nonspam] ../data/000/015                  result: PASS [0.99]
[test: nonspam] ../data/000/016                  result: PASS [0.76]
[test: nonspam] ../data/000/017                  result: PASS [0.91]
[test: nonspam] ../data/000/018                  result: PASS [0.99]
[test: nonspam] ../data/000/019                  result: PASS [0.85]
[test: nonspam] ../data/000/020                  result: PASS [0.68]
[test: nonspam] ../data/000/021                  result: PASS [0.70]
[test: nonspam] ../data/000/022                  result: PASS [0.83]
[test: nonspam] ../data/000/023                  result: PASS [0.85]
[test: nonspam] ../data/000/024                  result: PASS [0.56]
[test: nonspam] ../data/000/025                  result: PASS [0.65]
[test: nonspam] ../data/000/026                  result: PASS [0.92]
[test: nonspam] ../data/000/027                  result: PASS [1.00]
[test: nonspam] ../data/000/028                  result: PASS [0.80]
[test: nonspam] ../data/000/029                  result: PASS [0.65]
[test: nonspam] ../data/000/030                  result: PASS [0.65]
[test: nonspam] ../data/000/031                  result: PASS [0.62]
[test: nonspam] ../data/000/032                  result: PASS [0.69]
[test: nonspam] ../data/000/033                  result: PASS [0.99]
[test: nonspam] ../data/000/034                  result: PASS [0.75]
[test: nonspam] ../data/000/035                  result: PASS [0.96]
[test: nonspam] ../data/000/036                  result: PASS [0.69]
[test: nonspam] ../data/000/037                  result: PASS [0.60]
[test: nonspam] ../data/000/038                  result: PASS [0.99]
[test: nonspam] ../data/000/039                  result: PASS [0.76]
[test: nonspam] ../data/000/040                  result: PASS [0.70]
[test: nonspam] ../data/000/041                  result: PASS [0.71]
[test: nonspam] ../data/000/042                  result: PASS [0.76]
[test: nonspam] ../data/000/043                  result: PASS [0.65]
[test: nonspam] ../data/000/044                  result: PASS [0.86]
[test: nonspam] ../data/000/045                  result: PASS [0.85]
[test: nonspam] ../data/000/046                  result: PASS [0.85]
[test: nonspam] ../data/000/047                  result: PASS [0.72]
[test: nonspam] ../data/000/048                  result: PASS [0.85]
[test: nonspam] ../data/000/049                  result: PASS [1.00]
[test: nonspam] ../data/000/050                  result: PASS [0.99]
[test: nonspam] ../data/000/051                  result: PASS [0.99]
[test: nonspam] ../data/000/052                  result: PASS [0.99]
[test: nonspam] ../data/000/053                  result: PASS [0.57]
[test: nonspam] ../data/000/054                  result: PASS [0.75]
[test: nonspam] ../data/000/055                  result: PASS [0.85]
[test: nonspam] ../data/000/056                  result: PASS [0.85]
[test: nonspam] ../data/000/057                  result: PASS [0.67]
[test: nonspam] ../data/000/058                  result: PASS [0.70]
[test: nonspam] ../data/000/059                  result: PASS [0.99]
[test: nonspam] ../data/000/060                  result: PASS [0.85]
[test: nonspam] ../data/000/061                  result: PASS [0.75]
[test: nonspam] ../data/000/062                  result: PASS [0.99]
[test: nonspam] ../data/000/063                  result: PASS [0.70]
[test: nonspam] ../data/000/064                  result: PASS [0.85]
[test: nonspam] ../data/000/065                  result: PASS [0.85]
[test: nonspam] ../data/000/066                  result: PASS [0.99]
[test: nonspam] ../data/000/067                  result: PASS [0.76]
[test: nonspam] ../data/000/068                  result: PASS [0.76]
[test: nonspam] ../data/000/069                  result: PASS [0.85]
[test: nonspam] ../data/000/070                  result: PASS [0.75]
[test: nonspam] ../data/000/071                  result: PASS [0.75]
[test: nonspam] ../data/000/072                  result: PASS [0.83]
[test: nonspam] ../data/000/073                  result: PASS [0.61]
[test: nonspam] ../data/000/074                  result: PASS [0.53]
[test: nonspam] ../data/000/075                  result: PASS [0.99]
[test: nonspam] ../data/000/076                  result: PASS [0.61]
[test: spam   ] ../data/000/077                  result: PASS [0.99]
[test: spam   ] ../data/000/078                  result: PASS [0.99]
[test: spam   ] ../data/000/079                  result: PASS [0.99]
[test: spam   ] ../data/000/080                  result: PASS [0.99]
[test: spam   ] ../data/000/081                  result: PASS [0.99]
[test: spam   ] ../data/000/082                  result: PASS [0.99]
[test: spam   ] ../data/000/083                  result: PASS [0.99]
[test: spam   ] ../data/000/084                  result: PASS [0.99]
[test: spam   ] ../data/000/085                  result: PASS [0.99]
[test: spam   ] ../data/000/086                  result: PASS [0.99]
[test: spam   ] ../data/000/087                  result: PASS [0.99]
[test: spam   ] ../data/000/088                  result: PASS [0.99]
[test: spam   ] ../data/000/089                  result: PASS [0.99]
[test: spam   ] ../data/000/090                  result: PASS [0.99]
[test: spam   ] ../data/000/091                  result: PASS [0.99]
[test: spam   ] ../data/000/092                  result: PASS [0.99]
[test: spam   ] ../data/000/093                  result: PASS [0.99]
[test: spam   ] ../data/000/094                  result: PASS [0.99]
[test: spam   ] ../data/000/095                  result: PASS [0.99]
[test: spam   ] ../data/000/096                  result: PASS [0.99]
[test: spam   ] ../data/000/097                  result: PASS [0.99]
[test: spam   ] ../data/000/098                  result: PASS [0.99]
[test: spam   ] ../data/000/099                  result: PASS [0.99]
[test: spam   ] ../data/000/100                  result: PASS [0.99]
[test: spam   ] ../data/000/101                  result: PASS [0.99]
[test: spam   ] ../data/000/102                  result: PASS [0.99]
[test: spam   ] ../data/000/103                  result: PASS [0.99]
[test: spam   ] ../data/000/104                  result: PASS [0.99]
[test: spam   ] ../data/000/105                  result: PASS [0.99]
[test: spam   ] ../data/000/106                  result: PASS [0.99]
[test: spam   ] ../data/000/107                  result: PASS [0.99]
[test: spam   ] ../data/000/108                  result: PASS [0.99]
[test: spam   ] ../data/000/109                  result: PASS [0.99]
[test: spam   ] ../data/000/110                  result: PASS [0.99]
[test: spam   ] ../data/000/111                  result: PASS [0.99]
[test: spam   ] ../data/000/112                  result: PASS [0.99]
[test: spam   ] ../data/000/113                  result: PASS [0.99]
[test: spam   ] ../data/000/114                  result: PASS [0.99]
[test: spam   ] ../data/000/115                  result: PASS [0.99]
[test: spam   ] ../data/000/116                  result: PASS [0.99]
[test: spam   ] ../data/000/117                  result: PASS [0.84]
[test: spam   ] ../data/000/118                  result: PASS [0.99]
[test: spam   ] ../data/000/119                  result: PASS [0.99]
[test: spam   ] ../data/000/120                  result: PASS [0.99]
[test: spam   ] ../data/000/121                  result: PASS [0.99]
[test: nonspam] ../data/000/122                  result: PASS [0.76]
[test: nonspam] ../data/000/123                  result: PASS [0.62]
[test: spam   ] ../data/000/124                  result: PASS [0.84]
[test: spam   ] ../data/000/125                  result: PASS [0.99]
[test: spam   ] ../data/000/126                  result: PASS [0.99]
[test: spam   ] ../data/000/127                  result: PASS [0.99]
[test: spam   ] ../data/000/128                  result: PASS [0.99]
[test: spam   ] ../data/000/129                  result: PASS [0.99]
[test: spam   ] ../data/000/130                  result: PASS [0.99]
[test: spam   ] ../data/000/131                  result: PASS [0.99]
[test: spam   ] ../data/000/132                  result: PASS [0.99]
[test: spam   ] ../data/000/133                  result: PASS [0.99]
[test: spam   ] ../data/000/134                  result: PASS [0.99]
[test: spam   ] ../data/000/135                  result: PASS [0.99]
[test: spam   ] ../data/000/136                  result: PASS [0.99]
[test: spam   ] ../data/000/137                  result: PASS [0.99]
[test: spam   ] ../data/000/138                  result: PASS [0.89]
[test: spam   ] ../data/000/139                  result: PASS [0.89]
[test: spam   ] ../data/000/140                  result: PASS [0.89]
[test: spam   ] ../data/000/141                  result: PASS [0.89]
[test: spam   ] ../data/000/142                  result: PASS [0.89]
[test: spam   ] ../data/000/143                  result: PASS [0.89]
[test: spam   ] ../data/000/144                  result: PASS [0.99]
[test: spam   ] ../data/000/145                  result: PASS [0.99]
[test: spam   ] ../data/000/146                  result: PASS [0.99]
[test: spam   ] ../data/000/147                  result: PASS [0.99]
[test: spam   ] ../data/000/148                  result: PASS [0.99]
[test: spam   ] ../data/000/149                  result: PASS [0.99]
[test: spam   ] ../data/000/150                  result: PASS [0.99]
[test: spam   ] ../data/000/151                  result: PASS [0.99]
[test: spam   ] ../data/000/152                  result: PASS [0.99]
[test: spam   ] ../data/000/153                  result: PASS [0.99]
[test: spam   ] ../data/000/154                  result: PASS [0.99]
[test: spam   ] ../data/000/155                  result: PASS [0.99]
[test: spam   ] ../data/000/156                  result: PASS [0.99]
[test: spam   ] ../data/000/157                  result: PASS [0.89]
[test: spam   ] ../data/000/158                  result: PASS [0.99]
[test: spam   ] ../data/000/159                  result: PASS [0.99]
[test: spam   ] ../data/000/160                  result: PASS [0.99]
[test: spam   ] ../data/000/161                  result: PASS [0.78]
        [tone] Subject: =?utf-8?q?Pimpware?=
[test: spam   ] ../data/000/161                  result: PASS [0.82]
[test: spam   ] ../data/000/162                  result: PASS [0.83]
[test: spam   ] ../data/000/163                  result: PASS [0.99]
[test: spam   ] ../data/000/164                  result: PASS [0.99]
[test: spam   ] ../data/000/165                  result: PASS [0.99]
[test: spam   ] ../data/000/166                  result: PASS [0.99]
[test: spam   ] ../data/000/167                  result: PASS [0.99]
[test: spam   ] ../data/000/168                  result: PASS [0.99]
[test: spam   ] ../data/000/169                  result: PASS [0.86]
[test: spam   ] ../data/000/170                  result: PASS [0.78]
        [tone] Subject: =?utf-8?q?This report must mod?=
[test: spam   ] ../data/000/170                  result: PASS [0.87]
[test: spam   ] ../data/000/171                  result: PASS [0.79]
        [tone] Subject: =?utf-8?q?This dispatch must a?=
[test: spam   ] ../data/000/171                  result: PASS [0.80]
        [tone] Subject: =?utf-8?q?This dispatch must a?=
[test: spam   ] ../data/000/171                  result: PASS [0.82]
[test: spam   ] ../data/000/172                  result: PASS [0.99]
[test: spam   ] ../data/000/173                  result: PASS [0.99]
[test: spam   ] ../data/000/174                  result: PASS [0.99]
[test: spam   ] ../data/000/175                  result: PASS [0.99]
[test: spam   ] ../data/000/176                  result: PASS [0.99]
[test: spam   ] ../data/000/177                  result: PASS [0.99]
[test: spam   ] ../data/000/178                  result: PASS [0.99]
[test: spam   ] ../data/000/179                  result: PASS [0.99]
[test: spam   ] ../data/000/180                  result: PASS [0.99]
[test: spam   ] ../data/000/181                  result: PASS [0.99]
[test: spam   ] ../data/000/182                  result: PASS [0.99]
[test: spam   ] ../data/000/183                  result: PASS [0.99]
[test: spam   ] ../data/000/184                  result: PASS [0.91]
[test: spam   ] ../data/000/185                  result: PASS [0.99]
[test: spam   ] ../data/000/186                  result: PASS [0.99]
[test: spam   ] ../data/000/187                  result: PASS [0.99]
[test: spam   ] ../data/000/188                  result: PASS [0.99]
[test: spam   ] ../data/000/189                  result: PASS [0.99]
[test: spam   ] ../data/000/190                  result: PASS [0.99]
[test: spam   ] ../data/000/191                  result: PASS [0.99]
[test: spam   ] ../data/000/192                  result: PASS [0.99]
[test: spam   ] ../data/000/193                  result: PASS [0.99]
[test: spam   ] ../data/000/194                  result: PASS [0.99]
[test: spam   ] ../data/000/195                  result: PASS [0.99]
[test: spam   ] ../data/000/196                  result: PASS [0.99]
[test: spam   ] ../data/000/197                  result: PASS [0.99]
[test: spam   ] ../data/000/198                  result: PASS [0.99]
[test: spam   ] ../data/000/199                  result: PASS [0.99]
TRAINING COMPLETE

=================================================================
 Processed: 200 | TP: 121 | TN: 79 | FP: 0 | FN: 0
=================================================================

Training Snapshot:
mergedglobal
    TP:     0 TN:     0 FP:     0 FN:     0 SC:     4 NC:     0
    SHR:  100.00%       HSR:    0.00%       OCA:  100.00%

Overall Statistics:
mergedglobal
    TP:     0 TN:     0 FP:     1 FN:     0 SC: 93410 NC: 52955
    SHR:  100.00%       HSR:  100.00%       OCA:    0.00%
theia full #


You see that messages below my threshold (0.80 for Spam and 0.40 for Ham) got 
learned regardless their proper class hit from the initial classification.

To run the whole test on a rather slow system I don't need as much time as I 
would with the original dspam_train script. And I don't need any additional 
data in the dspam_signature_data table. This technique allows me to just run 
the training script and know that it will not kill my storage backend with 
useless data. Running the original dspam_train script will lead to +/- 1GB of 
data in dspam_signature_data (because the TREC05 corpus is around 962MB) plus 
more data in dspam_token_data (wich is normal if the training script is 
learning) and on top of that will blow up my MySQL replication log for nothing 
(taking again more space). I know that I can purge the DSPAM tables and purge 
my transaction/replication log from MySQL. But I did not wanted to fix a 
problem with cleaning. My goal was/is to not let the problem show up in the 
first place.

I initially wrote the script to allow me to have a Spam catching address and 
then deliver everything from that address to a Maildir and then run the 
training script during the night against that Maildir and learn everything that 
has been delivered during the day for that Spam catching address. But I did not 
like the original scripts method where everything gets a signature and then 
errors are relearned based on the signature. That is to heavy for a Spam 
catching/honeypot system. So I started to code a new version and later merged 
that new version with the original dspam_train since I did not wanted to have 
two scripts doing +/- the same.

I don't know how others train but I guess most of them have their own tools for 
that task. I can't imagine that they are using the old and heavy dspam_train. 
But I could be wrong? I don't really know. Have never asked. I only know that 
Paul Cockings asked some time ago how to make something like that and I 
responded by asking some questions which he never responded to.


> >> Best Regards,
> >> Carlo Rodrigues
> >>
-- 
Kind Regards from Switzerland,

Stevan Bajić

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to