https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155

--- Comment #132 from Mark Martinec <[email protected]> 2009-10-26 12:26:49 
UTC ---
> Other whitelisting rules (HABEAS_*, RCVD_IN_IADB_*, RCVD_IN_BSP_TRUSTED etc)
> have the same scores as in the previous 50_scores.cf. 

They do not have the same scores, seems to me they are all mostly
much lower. Please ignore the comments in 50_scores_newest3.cf,
just take into account uncommented 'score' lines:

score HABEAS_ACCREDITED_COI 0
score HABEAS_ACCREDITED_SOI 0 -1.634 0 -0.475

score RCVD_IN_BSP_TRUSTED 0 -0.001 0 -0.001

score RCVD_IN_IADB_DK 0 -0.044 0 -0.001
score RCVD_IN_IADB_DOPTIN 0
score RCVD_IN_IADB_DOPTIN_GT50 0
score RCVD_IN_IADB_DOPTIN_LT50 0 -0.001 0 -0.001
score RCVD_IN_IADB_EDDB 0
score RCVD_IN_IADB_EPIA 0
score RCVD_IN_IADB_GOODMAIL 0
score RCVD_IN_IADB_LISTED 0 -1.144 0 -0.001
score RCVD_IN_IADB_LOOSE 0
score RCVD_IN_IADB_MI_CPEAR 0
score RCVD_IN_IADB_MI_CPR_30 0
score RCVD_IN_IADB_MI_CPR_MAT 0 -0.079 0 -0.001
score RCVD_IN_IADB_ML_DOPTIN 0
score RCVD_IN_IADB_NOCONTROL 0
score RCVD_IN_IADB_OOO 0
score RCVD_IN_IADB_OPTIN 0 -3.265 0 -2.791
score RCVD_IN_IADB_OPTIN_GT50 0 -0.219 0 -1.041
score RCVD_IN_IADB_OPTIN_LT50 0
score RCVD_IN_IADB_OPTOUTONLY 0
score RCVD_IN_IADB_RDNS 0 -0.018 0 -0.001
score RCVD_IN_IADB_SENDERID 0 -0.001 0 -0.001
score RCVD_IN_IADB_SPF 0 -0.006 0 -0.042
score RCVD_IN_IADB_UNVERIFIED_1 0
score RCVD_IN_IADB_UNVERIFIED_2 0
score RCVD_IN_IADB_UT_CPEAR 0
score RCVD_IN_IADB_UT_CPR_30 0
score RCVD_IN_IADB_UT_CPR_MAT 0 -0.001 0 -0.052
score RCVD_IN_IADB_VOUCHED 0 -1.718 0 -0.956

score RCVD_IN_DNSWL_LOW  0 -0.6 0 -1.1
score RCVD_IN_DNSWL_MED  0 -1.5 0 -1.2
score RCVD_IN_DNSWL_HI   0 -1.8 0 -1.8


> I was wondering why the dnswl.org rules have specifically lower scores than in
> previous versions - and extremely low scores. This is worrying me, as it would
> indicate we have a quality issue in the dnswl.org data.

These all have pretty low rank:

$ grep RCVD_IN_DNSWL_ freqs.full
OVERALL    SPAM%     HAM%     S/O    RANK   SCORE  NAME
  0.184   0.0005   0.5708    0.001   0.76   -1.80  RCVD_IN_DNSWL_HI
  7.410   0.1094  22.7527    0.005   0.67   -1.20  RCVD_IN_DNSWL_MED
  2.551   0.1810   7.5322    0.023   0.59   -1.10  RCVD_IN_DNSWL_LOW

the _HI gets a low automatic score probably because it hits very little mail,
so it probably needs manual tweaking. The _MED seems to hit too many spam
messages in the submitted logs for rescoring runs, or perhaps it has a high
overlap with other similar rules.

It is quite possible that some of these hits are still false positives,
despite several iterations of cleaning:

for j in spam*.log; do echo -n $j; grep RCVD_IN_DNSWL_HI $j | \
  wc -l; done | sort -k2nr

spam-bayes-net-bb-jhardin.log         3
spam-bayes-net-bb-kmcgrail.log        2
spam-bayes-net-bb-guenther_fraud.log  1
spam-bayes-net-hege.log               1

same on _MED:

spam-bayes-net-bluestreak.log     381
spam-bayes-net-hege.log            79
spam-bayes-net-bb-jhardin.log      23
spam-bayes-net-wt-en1.log          15
spam-bayes-net-bb-kmcgrail.log     14
spam-bayes-net-jm-decimated.log    11
spam-bayes-net-ahenry.log           9
spam-bayes-net-dos-decimated.log    6
spam-bayes-net-bb-zmi.log           3
spam-bayes-net-mmartinec.log        3
spam-bayes-net-wt-en4.log           2

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to