Re: [Dspam-devel] To bail or not to bail?

Stevan Bajić Tue, 17 Nov 2009 15:36:35 -0800

Hallo Carlo,


> 3.2% to %0.17 -> almost 19 times more effective.
> > 
I just checked the files to see how many of them have no body and this is the 
result:
theia trec05p-1 # find data -type f | while read foo ; do if [ $(sed '1,/^$/d' 
${foo} | wc -l) -lt 1 ] ; then echo ${foo} ; fi ; done | wc -l
331
theia trec05p-1 # cd full
theia full # cat index | while read foo ; do if [ $(sed '1,/^$/d' ${foo/* /} | 
wc -l) -lt 1 ] ; then echo ${foo/* /} ; fi ; done | wc -l
331
theia full #

311 on the file system and 331 referenced in the index. I had 155 failures in 
total of which 19 are 4MB (my MessageMaxSize) and bigger:
theia full # find ../data/ -type f -size +4M | wc -l
19
theia full #

To be honest: I think that 155 failures are a good value. I dont see currently 
big needs to lower that number down any more.

My training script does already handle messages that give no output in summary 
mode but do print out the whole message if DSPAM is instructed to deliver 
innocent,spam to stdout. Just an example with my latest dev version of the 
training script:
theia full # ../../../../dspam_train_tone_v5 mergedglobal --overleap 20345 
--stop-after 10 --refute --max-train 3 --spam-threshold 80 --ham-threshold 40 
-i index
Taking Snapshot...
mergedglobal
    TP:     0 TN:     0 FP:     1 FN:     0 SC: 95517 NC: 54681
====================================================================
Training corpora:
  Using index file:    index
Parameters:
  Show subject:        No
  Random:              No
  Refute:              Yes
  Spam TONE Threshold: 0.8
  Ham TONE Threshold:  0.4
  Maximum retrain:     1
  Overleap:            20345
  Stop after:          10
====================================================================

Training on index index...
[test: nonspam] ../data/067/246                  result: FAIL [#.##] (probably 
over MaxMessageSize)
[test: spam   ] ../data/067/247                  result: PASS [0.99]
[test: nonspam] ../data/067/248                  result: PASS [0.70]
[test: nonspam] ../data/067/249                  result: PASS [0.85]
[test: nonspam] ../data/067/250                  result: PASS [0.66]
[test: nonspam] ../data/067/251                  result: PASS [0.59]
[test: nonspam] ../data/067/252                  result: PASS [0.99]
[test: nonspam] ../data/067/253                  result: PASS [0.59]
[test: nonspam] ../data/067/254                  result: PASS [1.00]
[test: nonspam] ../data/067/255                  result: FAIL [#.##] (probably 
over MaxMessageSize)
TRAINING COMPLETE

====================================================================
 Processed: 10 | TP: 1 | TN: 7 | FP: 0 | FN: 0
====================================================================

Training Snapshot:
mergedglobal
    TP:     0 TN:     0 FP:     0 FN:     0 SC:     0 NC:     0
    SHR:  100.00%       HSR:    0.00%       OCA:  100.00%

Overall Statistics:
mergedglobal
    TP:     0 TN:     0 FP:     1 FN:     0 SC: 95517 NC: 54681
    SHR:  100.00%       HSR:  100.00%       OCA:    0.00%
theia full # ls -lah ../data/067/246
-rw-rw-r-- 1 root root 8.2M Aug  4  2005 ../data/067/246
theia full # ls -lah ../data/067/255
-rw-rw-r-- 1 root root 9.5M Aug  4  2005 ../data/067/255
theia full #

I will however check 3.8.0 and look if it is better then 3.9.0. If it is, then 
I might go on and enhance 3.9.0 to be on par with 3.8.0. But if 3.9.0 is better 
then I am leaving it and not going to invest more time in fiddeling around with 
issues that are no real issues. At least no one was complaining so far about it.


> >> Best Regards,
> >> Carlo Rodrigues
> >>
> >>     
-- 
Kind Regards from Switzerland,

Stevan Bajić

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Re: [Dspam-devel] To bail or not to bail?

Reply via email to