Bug Tracker item #3142744, was opened at 2010-12-23 11:31 Message generated for change (Tracker Item Submitted) made by unwesen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=3142744&group_id=250683
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jens Finkhaeuser (unwesen) Assigned to: Nobody/Anonymous (nobody) Summary: Undo whitelisting suggestion Initial Comment: As I mentioned in a different issue (that I just found while looking for this), it seems very hard to train dspam to *not* whitelist some sender. This all seems to boil down to this code here: if (CTX->flags & DSF_WHITELIST) { if (ds_term->key == whitelist_token &&. ds_term->s.spam_hits <= (ds_term->s.innocent_hits / 15) &&. ds_term->s.innocent_hits > CTX->wh_threshold &&. CTX->classification == DSR_NONE) { do_whitelist = 1; } } Ca. line 930 in libdspam.c. The whitelist_token appears to be calculated from the sender address (or from: line); so I understand the logic that if a sender is found, and it's got 15x as many innocent hits as spam hits, then whitelist the message (leaving out a few details here). I think that logic works well enough for deciding that a sender can be presumed innocent, but it doesn't work very well for suggesting that the sender might in fact not be a good candidate for whitelisting. That logic seems to be in there because the whitelist_token's spam probability is hardcoded to 0.5 (in _ds_calc_stats). Wouldn't it make much more sense to calculate its probability properly, and use wh_threshold as a probability threshold, i.e. if the spam probability is below 0.3 or whatever, then whitelist it? That way you can use the same probability calculation as for other terms and therefore train dspam, but still treat the whitelist token as special in that if it is trained to be ok, then the rest of the tokens get disregarded because the message is whitelisted. I've attached a patch that compiles, but is otherwise untested - mostly because I have no idea of what ramifications the change might have outside the code I touched. Also, it changes the meaning and format of the wh_token config variable, which is most likely *not* what you want. But it'll convey what I mean better than writing more text :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=3142744&group_id=250683 ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Dspam-devel mailing list Dspam-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-devel