On 10/24/2012 02:34 AM, Mark Martinec wrote:
On Tuesday October 23 2012 22:26:00 Axb wrote:
Spamc/Spamd's "skip size" method has made a huge *positive* difference
on FPs, and scan times.
The FNs wouldn't *ever* have been caught by a chunk method due to the
kind of content included "above" threshold.
Out of curiosity, during the last 10 days our system detected
almost 200 large spam messages (manually confirmed spam) with
size above 400 kB (of which SpamAssassin saw only the first
420 kB, the rest was truncated).
Of these there were 55 distinct species:
17 in the 400..500 kB region
16 in the 500..700 kB region
9 in the 700..1000 kB region
10 in the 1000..2000 kB region
2 of 2.8 MB
1 of 3.6 MB
Median spam score (by species) for these was Q2=15.5,
quartiles score Q1=11 and Q3=27, so I'd say SpamAssassin did
a good job with these. The most valuable score contributions
seems to have been a mail header section (subject, RBL, bayes),
attachment contents was probably less important.
SA's default is 512kB, right? Many ppl raise that to close to 1MB
After that, how much of your checked corpus would have survived RBL
rejects at MTA level?
Such a sample doesn't convince me (Yet) as it doesn't show potential FPs
due scans on raw encoded attachments after 4 lines of txt/html as well
as timing per body rule type.
Could you let me have this sample corpus to compare results with
spamc/spamd under different conditions?
Axb