Bug Tracker item #3141675, was opened at 2010-12-22 13:03 Message generated for change (Comment added) made by unwesen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=3141675&group_id=250683
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: daemon Group: v3.9.0 >Status: Open Resolution: Fixed Priority: 5 Private: No Submitted By: Jens Finkhaeuser (unwesen) Assigned to: Stevan Bajic (sbajic) Summary: Unlearning a message is broken Initial Comment: I've been trying to figure out why it seems impossible to teach dspam to unlearn whitelisting. While doing so, I've stumbled across a bug that appears to be unrelated. In libdspam.c, ca. line 1280: if (CTX->classification == DSR_ISINNOCENT) { if (CTX->flags & DSF_UNLEARN) { if (CTX->classification == DSR_ISSPAM) { if (occurrence) { ds_term->s.innocent_hits -= ds_term->frequency; if (ds_term->s.innocent_hits < 0) ds_term->s.innocent_hits = 0; } else { ds_term->s.innocent_hits -= (ds_term->s.innocent_hits>0) ? 1:0; } } So if the email is classified as innocent, and the unlearn flag is set, then check whether the email is spam before doing anything. How can that be right? At this point, the email is already known to be innocent, so the check for whether it's spam must always fail. A similar issue can be found a few lines further down, ca. 1325. ---------------------------------------------------------------------- >Comment By: Jens Finkhaeuser (unwesen) Date: 2011-05-14 10:09 Message: Ah! Sorry, I forgot about this! Yes! I've been running this patch since Christmas, and it seems to work better for me. ---------------------------------------------------------------------- Comment By: Stevan Bajic (sbajic) Date: 2011-02-09 20:39 Message: Hello Jens. It's some time that Christmas has passed. Had you time to check if the unlearn is now working as expected? ---------------------------------------------------------------------- Comment By: Jens Finkhaeuser (unwesen) Date: 2010-12-23 16:24 Message: Well, I've not actually tried to use DSF_UNLEARN, it's just that I noticed the conditions being weird when looking at the code. I can give it a spin, but probably after christmas :) ---------------------------------------------------------------------- Comment By: Stevan Bajic (sbajic) Date: 2010-12-23 15:24 Message: Yes. You are right. The original code is wrong. That is the reason I posted a patch. You should try to apply it and tell me if the new modified code is better and does what you expect it to do or not? ---------------------------------------------------------------------- Comment By: Jens Finkhaeuser (unwesen) Date: 2010-12-23 11:40 Message: Yes, my phrasing was bad, sorry. I understand DSF_UNLEARN to change the spam/innocent counts for tokens, not to actually remove the tokens. My point really is that checking for DSR_INNOCENT, and then also checking for DSR_ISSPAM can't be right as I understand the code :) ---------------------------------------------------------------------- Comment By: Stevan Bajic (sbajic) Date: 2010-12-23 11:23 Message: DSF_UNLEARN does not remove signatures. Probably you meant to write/say that DSF_UNLEARN is removing tokens? But even this is not done. DSF_UNLEARN is just unlearning a learning phase from the past. So if you have token A with Spam count of 10 and you DSF_UNLEARN token A from the Spam class then the counter will be decreased and this then leads to token A with a Spam count of 9. And the same goes for the Innocent class. If you have token B with a Innocent count of 20 and you DSF_UNLEARN token B from the Innocent class then the token B will have a Innocent count of 19. Deleting the whole token A or B would be something like DSF_FORGET (this btw does not exist) rather than DSF_UNLEARN. And DSF_UNLEARN is not reclassifying in the traditional way. Usually reclassifying will change one of the Spam or Innocent hits and in the same time change one of the Spam or Innocent misses. A DSF_UNLEARN just changes the first part (the hits) but not the second part (the misses). ---------------------------------------------------------------------- Comment By: Jens Finkhaeuser (unwesen) Date: 2010-12-23 11:03 Message: Hmm, so, a clarification: I'm not sure if what I found there actually has to do with unlearning whitelisting, I just found it in my search. As far as I understand - and I've only just looked into the code - the DSF_UNLEARN flag is for removing signatures rather than reclassifying them. If that's true, then removing the second check for classification == DSR_ISSPAM should be all that's required to fix it. And the same for the spam case further down, around 1325. If that's not true, well, then I don't understand the code enough yet :) As I understand it, it's the case where DSF_UNLEARN is *not* set that reclassifies messages, and that might therefore influence whitelisting. But that's really another problem, so I'll open another issue for that. ---------------------------------------------------------------------- Comment By: Stevan Bajic (sbajic) Date: 2010-12-23 07:12 Message: Hallo Jens, can you check if the included patch would fix the issue? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=3141675&group_id=250683 ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ Dspam-devel mailing list Dspam-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-devel