Hallo Carlo,
> So during training, I would just ignore the message, write "Corrupt > message" or something, and move on to the next one. > > I do that already with my enhanced training script. Just as an example, the first 200 messages from TREC05: theia full # head -n 200 index > 200-index theia full # ../../../../dspam_train_tone_v4 mergedglobal -i 200-index Taking Snapshot... mergedglobal TP: 0 TN: 0 FP: 1 FN: 0 SC: 93406 NC: 52955 ================================================================= Training corpora: Using index file: 200-index Parameters: Show subject: No Refute: No Spam TONE Threshold: 0 Ham TONE Threshold: 0 Maximum retrain: 1 ================================================================= Training on 200-index index... [test: nonspam] ../data/000/000 result: PASS [0.99] [test: nonspam] ../data/000/001 result: PASS [0.75] [test: nonspam] ../data/000/002 result: PASS [0.84] [test: nonspam] ../data/000/003 result: PASS [0.85] [test: nonspam] ../data/000/004 result: PASS [0.85] [test: nonspam] ../data/000/005 result: PASS [0.99] [test: nonspam] ../data/000/006 result: PASS [0.76] [test: nonspam] ../data/000/007 result: PASS [0.76] [test: nonspam] ../data/000/008 result: PASS [0.59] [test: nonspam] ../data/000/009 result: PASS [0.85] [test: nonspam] ../data/000/010 result: PASS [0.56] [test: nonspam] ../data/000/011 result: PASS [0.69] [test: nonspam] ../data/000/012 result: PASS [0.93] [test: nonspam] ../data/000/013 result: PASS [0.75] [test: nonspam] ../data/000/014 result: PASS [0.56] [test: nonspam] ../data/000/015 result: PASS [0.99] [test: nonspam] ../data/000/016 result: PASS [0.76] [test: nonspam] ../data/000/017 result: PASS [0.91] [test: nonspam] ../data/000/018 result: PASS [0.99] [test: nonspam] ../data/000/019 result: PASS [0.85] [test: nonspam] ../data/000/020 result: PASS [0.68] [test: nonspam] ../data/000/021 result: PASS [0.70] [test: nonspam] ../data/000/022 result: PASS [0.83] [test: nonspam] ../data/000/023 result: PASS [0.85] [test: nonspam] ../data/000/024 result: PASS [0.56] [test: nonspam] ../data/000/025 result: PASS [0.65] [test: nonspam] ../data/000/026 result: PASS [0.92] [test: nonspam] ../data/000/027 result: PASS [1.00] [test: nonspam] ../data/000/028 result: PASS [0.80] [test: nonspam] ../data/000/029 result: PASS [0.65] [test: nonspam] ../data/000/030 result: PASS [0.65] [test: nonspam] ../data/000/031 result: PASS [0.62] [test: nonspam] ../data/000/032 result: PASS [0.69] [test: nonspam] ../data/000/033 result: PASS [0.99] [test: nonspam] ../data/000/034 result: PASS [0.75] [test: nonspam] ../data/000/035 result: PASS [0.96] [test: nonspam] ../data/000/036 result: PASS [0.69] [test: nonspam] ../data/000/037 result: PASS [0.60] [test: nonspam] ../data/000/038 result: PASS [0.99] [test: nonspam] ../data/000/039 result: PASS [0.76] [test: nonspam] ../data/000/040 result: PASS [0.70] [test: nonspam] ../data/000/041 result: PASS [0.71] [test: nonspam] ../data/000/042 result: PASS [0.76] [test: nonspam] ../data/000/043 result: PASS [0.65] [test: nonspam] ../data/000/044 result: PASS [0.86] [test: nonspam] ../data/000/045 result: PASS [0.85] [test: nonspam] ../data/000/046 result: PASS [0.85] [test: nonspam] ../data/000/047 result: PASS [0.72] [test: nonspam] ../data/000/048 result: PASS [0.85] [test: nonspam] ../data/000/049 result: PASS [1.00] [test: nonspam] ../data/000/050 result: PASS [0.99] [test: nonspam] ../data/000/051 result: PASS [0.99] [test: nonspam] ../data/000/052 result: PASS [0.99] [test: nonspam] ../data/000/053 result: PASS [0.57] [test: nonspam] ../data/000/054 result: PASS [0.75] [test: nonspam] ../data/000/055 result: PASS [0.85] [test: nonspam] ../data/000/056 result: PASS [0.85] [test: nonspam] ../data/000/057 result: PASS [0.67] [test: nonspam] ../data/000/058 result: PASS [0.70] [test: nonspam] ../data/000/059 result: PASS [0.99] [test: nonspam] ../data/000/060 result: PASS [0.85] [test: nonspam] ../data/000/061 result: PASS [0.75] [test: nonspam] ../data/000/062 result: PASS [0.99] [test: nonspam] ../data/000/063 result: PASS [0.70] [test: nonspam] ../data/000/064 result: PASS [0.85] [test: nonspam] ../data/000/065 result: PASS [0.85] [test: nonspam] ../data/000/066 result: PASS [0.99] [test: nonspam] ../data/000/067 result: PASS [0.76] [test: nonspam] ../data/000/068 result: PASS [0.76] [test: nonspam] ../data/000/069 result: PASS [0.85] [test: nonspam] ../data/000/070 result: PASS [0.75] [test: nonspam] ../data/000/071 result: PASS [0.75] [test: nonspam] ../data/000/072 result: PASS [0.83] [test: nonspam] ../data/000/073 result: PASS [0.61] [test: nonspam] ../data/000/074 result: PASS [0.53] [test: nonspam] ../data/000/075 result: PASS [0.99] [test: nonspam] ../data/000/076 result: PASS [0.61] [test: spam ] ../data/000/077 result: PASS [0.99] [test: spam ] ../data/000/078 result: PASS [0.99] [test: spam ] ../data/000/079 result: PASS [0.99] [test: spam ] ../data/000/080 result: PASS [0.99] [test: spam ] ../data/000/081 result: PASS [0.99] [test: spam ] ../data/000/082 result: PASS [0.99] [test: spam ] ../data/000/083 result: PASS [0.99] [test: spam ] ../data/000/084 result: PASS [0.99] [test: spam ] ../data/000/085 result: PASS [0.99] [test: spam ] ../data/000/086 result: PASS [0.99] [test: spam ] ../data/000/087 result: PASS [0.99] [test: spam ] ../data/000/088 result: PASS [0.99] [test: spam ] ../data/000/089 result: PASS [0.99] [test: spam ] ../data/000/090 result: PASS [0.99] [test: spam ] ../data/000/091 result: PASS [0.99] [test: spam ] ../data/000/092 result: PASS [0.99] [test: spam ] ../data/000/093 result: PASS [0.99] [test: spam ] ../data/000/094 result: PASS [0.99] [test: spam ] ../data/000/095 result: PASS [0.99] [test: spam ] ../data/000/096 result: PASS [0.99] [test: spam ] ../data/000/097 result: PASS [0.99] [test: spam ] ../data/000/098 result: PASS [0.99] [test: spam ] ../data/000/099 result: PASS [0.99] [test: spam ] ../data/000/100 result: PASS [0.99] [test: spam ] ../data/000/101 result: PASS [0.99] [test: spam ] ../data/000/102 result: PASS [0.99] [test: spam ] ../data/000/103 result: PASS [0.99] [test: spam ] ../data/000/104 result: PASS [0.99] [test: spam ] ../data/000/105 result: PASS [0.99] [test: spam ] ../data/000/106 result: PASS [0.99] [test: spam ] ../data/000/107 result: PASS [0.99] [test: spam ] ../data/000/108 result: PASS [0.99] [test: spam ] ../data/000/109 result: PASS [0.99] [test: spam ] ../data/000/110 result: PASS [0.99] [test: spam ] ../data/000/111 result: PASS [0.99] [test: spam ] ../data/000/112 result: PASS [0.99] [test: spam ] ../data/000/113 result: PASS [0.99] [test: spam ] ../data/000/114 result: PASS [0.99] [test: spam ] ../data/000/115 result: PASS [0.99] [test: spam ] ../data/000/116 result: PASS [0.99] [test: spam ] ../data/000/117 result: PASS [0.84] [test: spam ] ../data/000/118 result: PASS [0.99] [test: spam ] ../data/000/119 result: PASS [0.99] [test: spam ] ../data/000/120 result: PASS [0.99] [test: spam ] ../data/000/121 result: PASS [0.99] [test: nonspam] ../data/000/122 result: PASS [0.76] [test: nonspam] ../data/000/123 result: PASS [0.62] [test: spam ] ../data/000/124 result: PASS [0.84] [test: spam ] ../data/000/125 result: PASS [0.99] [test: spam ] ../data/000/126 result: PASS [0.99] [test: spam ] ../data/000/127 result: PASS [0.99] [test: spam ] ../data/000/128 result: PASS [0.99] [test: spam ] ../data/000/129 result: PASS [0.99] [test: spam ] ../data/000/130 result: PASS [0.99] [test: spam ] ../data/000/131 result: PASS [0.99] [test: spam ] ../data/000/132 result: PASS [0.99] [test: spam ] ../data/000/133 result: PASS [0.99] [test: spam ] ../data/000/134 result: PASS [0.99] [test: spam ] ../data/000/135 result: PASS [0.99] [test: spam ] ../data/000/136 result: PASS [0.99] [test: spam ] ../data/000/137 result: PASS [0.99] [test: spam ] ../data/000/138 result: PASS [0.89] [test: spam ] ../data/000/139 result: PASS [0.89] [test: spam ] ../data/000/140 result: PASS [0.89] [test: spam ] ../data/000/141 result: PASS [0.89] [test: spam ] ../data/000/142 result: PASS [0.89] [test: spam ] ../data/000/143 result: PASS [0.89] [test: spam ] ../data/000/144 result: PASS [0.99] [test: spam ] ../data/000/145 result: PASS [0.99] [test: spam ] ../data/000/146 result: PASS [0.99] [test: spam ] ../data/000/147 result: PASS [0.99] [test: spam ] ../data/000/148 result: PASS [0.99] [test: spam ] ../data/000/149 result: PASS [0.99] [test: spam ] ../data/000/150 result: PASS [0.99] [test: spam ] ../data/000/151 result: PASS [0.99] [test: spam ] ../data/000/152 result: PASS [0.99] [test: spam ] ../data/000/153 result: PASS [0.99] [test: spam ] ../data/000/154 result: PASS [0.99] [test: spam ] ../data/000/155 result: PASS [0.99] [test: spam ] ../data/000/156 result: PASS [0.99] [test: spam ] ../data/000/157 result: PASS [0.89] [test: spam ] ../data/000/158 result: PASS [0.99] [test: spam ] ../data/000/159 result: PASS [0.99] [test: spam ] ../data/000/160 result: PASS [0.99] [test: spam ] ../data/000/161 result: PASS [0.78] [test: spam ] ../data/000/162 result: PASS [0.83] [test: spam ] ../data/000/163 result: PASS [0.99] [test: spam ] ../data/000/164 result: PASS [0.99] [test: spam ] ../data/000/165 result: PASS [0.99] [test: spam ] ../data/000/166 result: PASS [0.99] [test: spam ] ../data/000/167 result: PASS [0.99] [test: spam ] ../data/000/168 result: PASS [0.99] [test: spam ] ../data/000/169 result: PASS [0.86] [test: spam ] ../data/000/170 result: PASS [0.78] [test: spam ] ../data/000/171 result: PASS [0.79] [test: spam ] ../data/000/172 result: PASS [0.99] [test: spam ] ../data/000/173 result: PASS [0.99] [test: spam ] ../data/000/174 result: PASS [0.99] [test: spam ] ../data/000/175 result: PASS [0.99] [test: spam ] ../data/000/176 result: PASS [0.99] [test: spam ] ../data/000/177 result: PASS [0.99] [test: spam ] ../data/000/178 result: PASS [0.99] [test: spam ] ../data/000/179 result: PASS [0.99] [test: spam ] ../data/000/180 result: PASS [0.99] [test: spam ] ../data/000/181 result: PASS [0.99] [test: spam ] ../data/000/182 result: PASS [0.99] [test: spam ] ../data/000/183 result: PASS [0.99] [test: spam ] ../data/000/184 result: PASS [0.91] [test: spam ] ../data/000/185 result: PASS [0.99] [test: spam ] ../data/000/186 result: PASS [0.99] [test: spam ] ../data/000/187 result: PASS [0.99] [test: spam ] ../data/000/188 result: PASS [0.99] [test: spam ] ../data/000/189 result: PASS [0.99] [test: spam ] ../data/000/190 result: PASS [0.99] [test: spam ] ../data/000/191 result: PASS [0.99] [test: spam ] ../data/000/192 result: PASS [0.99] [test: spam ] ../data/000/193 result: PASS [0.99] [test: spam ] ../data/000/194 result: PASS [0.99] [test: spam ] ../data/000/195 result: PASS [0.99] [test: spam ] ../data/000/196 result: PASS [0.99] [test: spam ] ../data/000/197 result: PASS [0.99] [test: spam ] ../data/000/198 result: PASS [0.99] [test: spam ] ../data/000/199 result: PASS [0.99] TRAINING COMPLETE ================================================================= Processed: 200 | TP: 121 | TN: 79 | FP: 0 | FN: 0 ================================================================= Training Snapshot: mergedglobal TP: 0 TN: 0 FP: 0 FN: 0 SC: 0 NC: 0 SHR: 100.00% HSR: 0.00% OCA: 100.00% Overall Statistics: mergedglobal TP: 0 TN: 0 FP: 1 FN: 0 SC: 93406 NC: 52955 SHR: 100.00% HSR: 100.00% OCA: 0.00% theia full # ../../../../dspam_train_tone_v4 ERROR: spam corpus must be path to maildir directory or MBOX file. Usage: ../../../../dspam_train_tone_v4 [username] DSPAM user name [--client] To run in client mode [--refute] To unlearn errors from opposite class [--subject] To show subject from error/unlearn/TONE [--max-retrain max_retrain] Maximum relearns per error/TONE [--spam-threshold threshold] TONE Spam threshold [--ham-threshold threshold] TONE Ham threshold [[-i index]|[spam_dir] [nonspam_dir]] theia full # I train totally different them most people do. I don't train on signatures and I use a double sided asymetric threshold as a reinforcement zone for TONE (Train On error or Near Error) training. The above example does not use it but if I would re-run it with an (insane value to show how it is working) threshold then the output looks like this: theia full # ../../../../dspam_train_tone_v4 mergedglobal -s -r -m 3 -st 80 -ht 40 -i 200-index Taking Snapshot... mergedglobal TP: 0 TN: 0 FP: 1 FN: 0 SC: 93406 NC: 52955 ================================================================= Training corpora: Using index file: 200-index Parameters: Show subject: Yes Refute: Yes Spam TONE Threshold: 0.8 Ham TONE Threshold: 0.4 Maximum retrain: 3 ================================================================= Training on 200-index index... [test: nonspam] ../data/000/000 result: PASS [0.99] [test: nonspam] ../data/000/001 result: PASS [0.75] [test: nonspam] ../data/000/002 result: PASS [0.84] [test: nonspam] ../data/000/003 result: PASS [0.85] [test: nonspam] ../data/000/004 result: PASS [0.85] [test: nonspam] ../data/000/005 result: PASS [0.99] [test: nonspam] ../data/000/006 result: PASS [0.76] [test: nonspam] ../data/000/007 result: PASS [0.76] [test: nonspam] ../data/000/008 result: PASS [0.59] [test: nonspam] ../data/000/009 result: PASS [0.85] [test: nonspam] ../data/000/010 result: PASS [0.56] [test: nonspam] ../data/000/011 result: PASS [0.69] [test: nonspam] ../data/000/012 result: PASS [0.93] [test: nonspam] ../data/000/013 result: PASS [0.75] [test: nonspam] ../data/000/014 result: PASS [0.56] [test: nonspam] ../data/000/015 result: PASS [0.99] [test: nonspam] ../data/000/016 result: PASS [0.76] [test: nonspam] ../data/000/017 result: PASS [0.91] [test: nonspam] ../data/000/018 result: PASS [0.99] [test: nonspam] ../data/000/019 result: PASS [0.85] [test: nonspam] ../data/000/020 result: PASS [0.68] [test: nonspam] ../data/000/021 result: PASS [0.70] [test: nonspam] ../data/000/022 result: PASS [0.83] [test: nonspam] ../data/000/023 result: PASS [0.85] [test: nonspam] ../data/000/024 result: PASS [0.56] [test: nonspam] ../data/000/025 result: PASS [0.65] [test: nonspam] ../data/000/026 result: PASS [0.92] [test: nonspam] ../data/000/027 result: PASS [1.00] [test: nonspam] ../data/000/028 result: PASS [0.80] [test: nonspam] ../data/000/029 result: PASS [0.65] [test: nonspam] ../data/000/030 result: PASS [0.65] [test: nonspam] ../data/000/031 result: PASS [0.62] [test: nonspam] ../data/000/032 result: PASS [0.69] [test: nonspam] ../data/000/033 result: PASS [0.99] [test: nonspam] ../data/000/034 result: PASS [0.75] [test: nonspam] ../data/000/035 result: PASS [0.96] [test: nonspam] ../data/000/036 result: PASS [0.69] [test: nonspam] ../data/000/037 result: PASS [0.60] [test: nonspam] ../data/000/038 result: PASS [0.99] [test: nonspam] ../data/000/039 result: PASS [0.76] [test: nonspam] ../data/000/040 result: PASS [0.70] [test: nonspam] ../data/000/041 result: PASS [0.71] [test: nonspam] ../data/000/042 result: PASS [0.76] [test: nonspam] ../data/000/043 result: PASS [0.65] [test: nonspam] ../data/000/044 result: PASS [0.86] [test: nonspam] ../data/000/045 result: PASS [0.85] [test: nonspam] ../data/000/046 result: PASS [0.85] [test: nonspam] ../data/000/047 result: PASS [0.72] [test: nonspam] ../data/000/048 result: PASS [0.85] [test: nonspam] ../data/000/049 result: PASS [1.00] [test: nonspam] ../data/000/050 result: PASS [0.99] [test: nonspam] ../data/000/051 result: PASS [0.99] [test: nonspam] ../data/000/052 result: PASS [0.99] [test: nonspam] ../data/000/053 result: PASS [0.57] [test: nonspam] ../data/000/054 result: PASS [0.75] [test: nonspam] ../data/000/055 result: PASS [0.85] [test: nonspam] ../data/000/056 result: PASS [0.85] [test: nonspam] ../data/000/057 result: PASS [0.67] [test: nonspam] ../data/000/058 result: PASS [0.70] [test: nonspam] ../data/000/059 result: PASS [0.99] [test: nonspam] ../data/000/060 result: PASS [0.85] [test: nonspam] ../data/000/061 result: PASS [0.75] [test: nonspam] ../data/000/062 result: PASS [0.99] [test: nonspam] ../data/000/063 result: PASS [0.70] [test: nonspam] ../data/000/064 result: PASS [0.85] [test: nonspam] ../data/000/065 result: PASS [0.85] [test: nonspam] ../data/000/066 result: PASS [0.99] [test: nonspam] ../data/000/067 result: PASS [0.76] [test: nonspam] ../data/000/068 result: PASS [0.76] [test: nonspam] ../data/000/069 result: PASS [0.85] [test: nonspam] ../data/000/070 result: PASS [0.75] [test: nonspam] ../data/000/071 result: PASS [0.75] [test: nonspam] ../data/000/072 result: PASS [0.83] [test: nonspam] ../data/000/073 result: PASS [0.61] [test: nonspam] ../data/000/074 result: PASS [0.53] [test: nonspam] ../data/000/075 result: PASS [0.99] [test: nonspam] ../data/000/076 result: PASS [0.61] [test: spam ] ../data/000/077 result: PASS [0.99] [test: spam ] ../data/000/078 result: PASS [0.99] [test: spam ] ../data/000/079 result: PASS [0.99] [test: spam ] ../data/000/080 result: PASS [0.99] [test: spam ] ../data/000/081 result: PASS [0.99] [test: spam ] ../data/000/082 result: PASS [0.99] [test: spam ] ../data/000/083 result: PASS [0.99] [test: spam ] ../data/000/084 result: PASS [0.99] [test: spam ] ../data/000/085 result: PASS [0.99] [test: spam ] ../data/000/086 result: PASS [0.99] [test: spam ] ../data/000/087 result: PASS [0.99] [test: spam ] ../data/000/088 result: PASS [0.99] [test: spam ] ../data/000/089 result: PASS [0.99] [test: spam ] ../data/000/090 result: PASS [0.99] [test: spam ] ../data/000/091 result: PASS [0.99] [test: spam ] ../data/000/092 result: PASS [0.99] [test: spam ] ../data/000/093 result: PASS [0.99] [test: spam ] ../data/000/094 result: PASS [0.99] [test: spam ] ../data/000/095 result: PASS [0.99] [test: spam ] ../data/000/096 result: PASS [0.99] [test: spam ] ../data/000/097 result: PASS [0.99] [test: spam ] ../data/000/098 result: PASS [0.99] [test: spam ] ../data/000/099 result: PASS [0.99] [test: spam ] ../data/000/100 result: PASS [0.99] [test: spam ] ../data/000/101 result: PASS [0.99] [test: spam ] ../data/000/102 result: PASS [0.99] [test: spam ] ../data/000/103 result: PASS [0.99] [test: spam ] ../data/000/104 result: PASS [0.99] [test: spam ] ../data/000/105 result: PASS [0.99] [test: spam ] ../data/000/106 result: PASS [0.99] [test: spam ] ../data/000/107 result: PASS [0.99] [test: spam ] ../data/000/108 result: PASS [0.99] [test: spam ] ../data/000/109 result: PASS [0.99] [test: spam ] ../data/000/110 result: PASS [0.99] [test: spam ] ../data/000/111 result: PASS [0.99] [test: spam ] ../data/000/112 result: PASS [0.99] [test: spam ] ../data/000/113 result: PASS [0.99] [test: spam ] ../data/000/114 result: PASS [0.99] [test: spam ] ../data/000/115 result: PASS [0.99] [test: spam ] ../data/000/116 result: PASS [0.99] [test: spam ] ../data/000/117 result: PASS [0.84] [test: spam ] ../data/000/118 result: PASS [0.99] [test: spam ] ../data/000/119 result: PASS [0.99] [test: spam ] ../data/000/120 result: PASS [0.99] [test: spam ] ../data/000/121 result: PASS [0.99] [test: nonspam] ../data/000/122 result: PASS [0.76] [test: nonspam] ../data/000/123 result: PASS [0.62] [test: spam ] ../data/000/124 result: PASS [0.84] [test: spam ] ../data/000/125 result: PASS [0.99] [test: spam ] ../data/000/126 result: PASS [0.99] [test: spam ] ../data/000/127 result: PASS [0.99] [test: spam ] ../data/000/128 result: PASS [0.99] [test: spam ] ../data/000/129 result: PASS [0.99] [test: spam ] ../data/000/130 result: PASS [0.99] [test: spam ] ../data/000/131 result: PASS [0.99] [test: spam ] ../data/000/132 result: PASS [0.99] [test: spam ] ../data/000/133 result: PASS [0.99] [test: spam ] ../data/000/134 result: PASS [0.99] [test: spam ] ../data/000/135 result: PASS [0.99] [test: spam ] ../data/000/136 result: PASS [0.99] [test: spam ] ../data/000/137 result: PASS [0.99] [test: spam ] ../data/000/138 result: PASS [0.89] [test: spam ] ../data/000/139 result: PASS [0.89] [test: spam ] ../data/000/140 result: PASS [0.89] [test: spam ] ../data/000/141 result: PASS [0.89] [test: spam ] ../data/000/142 result: PASS [0.89] [test: spam ] ../data/000/143 result: PASS [0.89] [test: spam ] ../data/000/144 result: PASS [0.99] [test: spam ] ../data/000/145 result: PASS [0.99] [test: spam ] ../data/000/146 result: PASS [0.99] [test: spam ] ../data/000/147 result: PASS [0.99] [test: spam ] ../data/000/148 result: PASS [0.99] [test: spam ] ../data/000/149 result: PASS [0.99] [test: spam ] ../data/000/150 result: PASS [0.99] [test: spam ] ../data/000/151 result: PASS [0.99] [test: spam ] ../data/000/152 result: PASS [0.99] [test: spam ] ../data/000/153 result: PASS [0.99] [test: spam ] ../data/000/154 result: PASS [0.99] [test: spam ] ../data/000/155 result: PASS [0.99] [test: spam ] ../data/000/156 result: PASS [0.99] [test: spam ] ../data/000/157 result: PASS [0.89] [test: spam ] ../data/000/158 result: PASS [0.99] [test: spam ] ../data/000/159 result: PASS [0.99] [test: spam ] ../data/000/160 result: PASS [0.99] [test: spam ] ../data/000/161 result: PASS [0.78] [tone] Subject: =?utf-8?q?Pimpware?= [test: spam ] ../data/000/161 result: PASS [0.82] [test: spam ] ../data/000/162 result: PASS [0.83] [test: spam ] ../data/000/163 result: PASS [0.99] [test: spam ] ../data/000/164 result: PASS [0.99] [test: spam ] ../data/000/165 result: PASS [0.99] [test: spam ] ../data/000/166 result: PASS [0.99] [test: spam ] ../data/000/167 result: PASS [0.99] [test: spam ] ../data/000/168 result: PASS [0.99] [test: spam ] ../data/000/169 result: PASS [0.86] [test: spam ] ../data/000/170 result: PASS [0.78] [tone] Subject: =?utf-8?q?This report must mod?= [test: spam ] ../data/000/170 result: PASS [0.87] [test: spam ] ../data/000/171 result: PASS [0.79] [tone] Subject: =?utf-8?q?This dispatch must a?= [test: spam ] ../data/000/171 result: PASS [0.80] [tone] Subject: =?utf-8?q?This dispatch must a?= [test: spam ] ../data/000/171 result: PASS [0.82] [test: spam ] ../data/000/172 result: PASS [0.99] [test: spam ] ../data/000/173 result: PASS [0.99] [test: spam ] ../data/000/174 result: PASS [0.99] [test: spam ] ../data/000/175 result: PASS [0.99] [test: spam ] ../data/000/176 result: PASS [0.99] [test: spam ] ../data/000/177 result: PASS [0.99] [test: spam ] ../data/000/178 result: PASS [0.99] [test: spam ] ../data/000/179 result: PASS [0.99] [test: spam ] ../data/000/180 result: PASS [0.99] [test: spam ] ../data/000/181 result: PASS [0.99] [test: spam ] ../data/000/182 result: PASS [0.99] [test: spam ] ../data/000/183 result: PASS [0.99] [test: spam ] ../data/000/184 result: PASS [0.91] [test: spam ] ../data/000/185 result: PASS [0.99] [test: spam ] ../data/000/186 result: PASS [0.99] [test: spam ] ../data/000/187 result: PASS [0.99] [test: spam ] ../data/000/188 result: PASS [0.99] [test: spam ] ../data/000/189 result: PASS [0.99] [test: spam ] ../data/000/190 result: PASS [0.99] [test: spam ] ../data/000/191 result: PASS [0.99] [test: spam ] ../data/000/192 result: PASS [0.99] [test: spam ] ../data/000/193 result: PASS [0.99] [test: spam ] ../data/000/194 result: PASS [0.99] [test: spam ] ../data/000/195 result: PASS [0.99] [test: spam ] ../data/000/196 result: PASS [0.99] [test: spam ] ../data/000/197 result: PASS [0.99] [test: spam ] ../data/000/198 result: PASS [0.99] [test: spam ] ../data/000/199 result: PASS [0.99] TRAINING COMPLETE ================================================================= Processed: 200 | TP: 121 | TN: 79 | FP: 0 | FN: 0 ================================================================= Training Snapshot: mergedglobal TP: 0 TN: 0 FP: 0 FN: 0 SC: 4 NC: 0 SHR: 100.00% HSR: 0.00% OCA: 100.00% Overall Statistics: mergedglobal TP: 0 TN: 0 FP: 1 FN: 0 SC: 93410 NC: 52955 SHR: 100.00% HSR: 100.00% OCA: 0.00% theia full # You see that messages below my threshold (0.80 for Spam and 0.40 for Ham) got learned regardless their proper class hit from the initial classification. To run the whole test on a rather slow system I don't need as much time as I would with the original dspam_train script. And I don't need any additional data in the dspam_signature_data table. This technique allows me to just run the training script and know that it will not kill my storage backend with useless data. Running the original dspam_train script will lead to +/- 1GB of data in dspam_signature_data (because the TREC05 corpus is around 962MB) plus more data in dspam_token_data (wich is normal if the training script is learning) and on top of that will blow up my MySQL replication log for nothing (taking again more space). I know that I can purge the DSPAM tables and purge my transaction/replication log from MySQL. But I did not wanted to fix a problem with cleaning. My goal was/is to not let the problem show up in the first place. I initially wrote the script to allow me to have a Spam catching address and then deliver everything from that address to a Maildir and then run the training script during the night against that Maildir and learn everything that has been delivered during the day for that Spam catching address. But I did not like the original scripts method where everything gets a signature and then errors are relearned based on the signature. That is to heavy for a Spam catching/honeypot system. So I started to code a new version and later merged that new version with the original dspam_train since I did not wanted to have two scripts doing +/- the same. I don't know how others train but I guess most of them have their own tools for that task. I can't imagine that they are using the old and heavy dspam_train. But I could be wrong? I don't really know. Have never asked. I only know that Paul Cockings asked some time ago how to make something like that and I responded by asking some questions which he never responded to. > >> Best Regards, > >> Carlo Rodrigues > >> -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Dspam-devel mailing list Dspam-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-devel