[Dspam-devel] [ dspam-Bug Tracker-3141675 ] Unlearning a message is broken

SourceForge.net Sat, 14 May 2011 02:11:54 -0700

Bug Tracker item #3141675, was opened at 2010-12-22 13:03
Message generated for change (Comment added) made by unwesen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=3141675&group_id=250683


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: daemon
Group: v3.9.0
>Status: Open
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Jens Finkhaeuser (unwesen)
Assigned to: Stevan Bajic (sbajic)
Summary: Unlearning a message is broken

Initial Comment:
I've been trying to figure out why it seems impossible to teach dspam to 
unlearn whitelisting. While doing so, I've stumbled across a bug that appears 
to be unrelated.

In libdspam.c, ca. line 1280:

      if (CTX->classification == DSR_ISINNOCENT)
      {
        if (CTX->flags & DSF_UNLEARN)
        {
          if (CTX->classification == DSR_ISSPAM)
          {
            if (occurrence)
            {
              ds_term->s.innocent_hits -= ds_term->frequency;
              if (ds_term->s.innocent_hits < 0)
                ds_term->s.innocent_hits = 0;
            } else {
              ds_term->s.innocent_hits -= (ds_term->s.innocent_hits>0) ? 1:0;
            }
          }

So if the email is classified as innocent, and the unlearn flag is set, then 
check whether the email is spam before doing anything. How can that be right? 
At this point, the email is already known to be innocent, so the check for 
whether it's spam must always fail.

A similar issue can be found a few lines further down, ca. 1325.

----------------------------------------------------------------------

>Comment By: Jens Finkhaeuser (unwesen)
Date: 2011-05-14 10:09

Message:
Ah! Sorry, I forgot about this! Yes! I've been running this patch since
Christmas, and it seems to work better for me.

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2011-02-09 20:39

Message:
Hello Jens. It's some time that Christmas has passed. Had you time to check
if the unlearn is now working as expected?

----------------------------------------------------------------------

Comment By: Jens Finkhaeuser (unwesen)
Date: 2010-12-23 16:24

Message:
Well, I've not actually tried to use DSF_UNLEARN, it's just that I noticed
the conditions being weird when looking at the code. I can give it a spin,
but probably after christmas :)

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-12-23 15:24

Message:
Yes. You are right. The original code is wrong. That is the reason I posted
a patch. You should try to apply it and tell me if the new modified code is
better and does what you expect it to do or not?

----------------------------------------------------------------------

Comment By: Jens Finkhaeuser (unwesen)
Date: 2010-12-23 11:40

Message:
Yes, my phrasing was bad, sorry.

I understand DSF_UNLEARN to change the spam/innocent counts for tokens,
not to actually remove the tokens.

My point really is that checking for DSR_INNOCENT, and then also checking
for DSR_ISSPAM can't be right as I understand the code :)

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-12-23 11:23

Message:
DSF_UNLEARN does not remove signatures. Probably you meant to write/say
that DSF_UNLEARN is removing tokens? But even this is not done. DSF_UNLEARN
is just unlearning a learning phase from the past. So if you have token A
with Spam count of 10 and you DSF_UNLEARN token A from the Spam class then
the counter will be decreased and this then leads to token A with a Spam
count of 9. And the same goes for the Innocent class. If you have token B
with a Innocent count of 20 and you DSF_UNLEARN token B from the Innocent
class then the token B will have a Innocent count of 19.

Deleting the whole token A or B would be something like DSF_FORGET (this
btw does not exist) rather than DSF_UNLEARN.

And DSF_UNLEARN is not reclassifying in the traditional way. Usually
reclassifying will change one of the Spam or Innocent hits and in the same
time change one of the Spam or Innocent misses. A DSF_UNLEARN just changes
the first part (the hits) but not the second part (the misses).

----------------------------------------------------------------------

Comment By: Jens Finkhaeuser (unwesen)
Date: 2010-12-23 11:03

Message:
Hmm, so, a clarification: I'm not sure if what I found there actually has
to do with unlearning whitelisting, I just found it in my search.

As far as I understand - and I've only just looked into the code - the
DSF_UNLEARN flag is for removing signatures rather than reclassifying them.
If that's true, then removing the second check for classification ==
DSR_ISSPAM should be all that's required to fix it. And the same for the
spam case further down, around 1325.

If that's not true, well, then I don't understand the code enough yet :)

As I understand it, it's the case where DSF_UNLEARN is *not* set that
reclassifies messages, and that might therefore influence whitelisting. But
that's really another problem, so I'll open another issue for that.

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-12-23 07:12

Message:
Hallo Jens, can you check if the included patch would fix the issue?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=3141675&group_id=250683

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

[Dspam-devel] [ dspam-Bug Tracker-3141675 ] Unlearning a message is broken

Reply via email to