Wiadomość napisana w dniu 2009-05-02, o godz. 15:39, przez Vernon
Schryver:
From: =?ISO-8859-2?Q?Micha=B3_Grz=EAdzicki?= <[email protected]>
By default spamassassin uses 99999 as dcc_body/fuz1/fuz2_max whitch
is =
same as dcc's many.
This is olny 1/6th of the messages.
Are you referring to the difference between 19% tagged as "many" and
the 51% with bulky counts according to the graphs for your server at
https://www.rhyolite.com/dcc/private/.... ?
yes, so trapped spam is 'many' and likely spam is somethng around 10 ?
That is a low value. Are you doing DCC filtering after other filters?
only some rbl blackholling + spf is done before
i'm talking only about messages that are getting DCC_CHECK score for
having "many" occurences and this is only 17 000 out of 102 000
this is coused by spamassassin always requiering many reports
I'm planing to add variable scoring to spamassassins DCC.PM to make
it =
more usefull ( now only messages with many reports are flagged).
I'm thinking about 40 reports getting 1/10 of the base score to 10
000 =
reports (or many, where does it start ?) getting whole base score,
500 reports may be treated as likelly spam with 1/2 of base score
in =
beatween maybe use 2 linear functions or one of higher order.
Base score should be around 4/5 of mark as spam score.
What would be good threstholds for wery unlikely spam, likelly
spam, =
surelly spam.
I doubt that would help. The DCC detects bulk email. Spam is
unsolicited
bulk email. Mail messages that have been seen 100 or 10,000 times are
equally bulky, and neither is more likely to be spam. Contrast Amazon
online order confirmations with Amazon advertisements. Both are very
bulky, but only some of the Amazon advertisements are spam.
That is why I have always said the best way to use DCC is with per-
user
whitelists. Each user's whitelist indicates which streams of bulk
mail
are solicited.
I think the SpamAssassin threshold of "many"/99999 is far too high.
The SpamAssassin conversion of "many" to 99999 is kludge that should
not have been code.
Instead, SpamAssassin should look for "bulk" in the X-DCC header
and the dccifd or dccproc thresholds should tell dccifd or dccproc
whether to add "bulk". See DCCM_REJECT_AT and DCCM_REJECT_AT
in /var/dcc/dcc_conf. See also -c and -t in the dccproc and dccifd
man pages; -t could be added to DCCIFD_ARGS
You are right, for now i will lower the requrements in spamassasin.
Maybe in feature i will add some "bonus" points to popular spams with
more then 1000 reports to make sure they are spam flaged eaven other
filters didn't engage them.
DCC.pm checks for X-DCC: bulk only if it has been added upstream,
dcc_conf mentiones 50 as bulk mail count so i will start with
something around 200
with lower score value (most of those spams whitch get threw alredy
have some points from other checks)
whitelised emails aren't scanned at all so I don't have to worry about
that at dcc level
I'm guessing this is the right aligment body fuz1 fuz2 checksums
with =
body getting most reports and fuz2 least reports.
Is this right?
If I understand the question, no. All of the checksums are computed
on all mail messages, but only reports of the most bulky checksums are
flooded among DCC servers. Body checksums are not at all fuzzy, and
so minimal personalizations can make each copy of spam have differing
DCC body checksums.
I guess i wasnt clear about that sorry.
I hope this time it will be clearer.
Are fuz1 and fuz2 computed from same parts of email eg. sender,
subject, X-Client + body, or fuz2 takes more headers ? Then wery
simillar spams can have same body hash same fuz1 but difrend fuz2
because fuz2 takes in acount X-Client header whitch difers in this 2
spams or mayby they take same subset of email, header + body but use
difrend fuzzing algoritm (like omiting whitespaces ignoring case ect.
to ignore minor diferences in spams)
If they use same subset of headers + body there's no point in
diferenting threstholds for fuz1 and fuz2, and if fuz2 inputs more
data it should have smaller thresthold then fuz1.
--
Michał Grzędzicki
_______________________________________________
DCC mailing list [email protected]
http://www.rhyolite.com/mailman/listinfo/dcc