Stevan Bajić wrote:
Hello, Stevan.

Hallo Carlo,


I would say that invalid messages such as those you spoke should be put into quarantine and tagged as 'Invalid' or 'Corrupt',

That could be a problem. I mean in that term that the operator of DSPAM could 
have decided to NOT use quarantine functionality and if we enforce the delivery 
to quarantine then we break that rule. And we would blow up the quarantine for 
nothing if someone is doing a large scale training where such messages could be 
in the training set and a forced delivery to quarantine is (probably) not what 
the trainer is looking for.

No, I didn't mean to 'force' them into quarantine. Only to deliver them into quarantine, _if quarantine is being used_.

And during training (dspam_train), quarantine is not used, anyway.
So during training, I would just ignore the message, write "Corrupt message" or something, and move on to the next one.
so the user can decide to receive them later. In fact, this is how dspam handles viruses, right?

I am not sure. It has been some time since I used Anti-Virus inside DSPAM. Does 
a infected message really get delivered into quarantine? I had the impression 
that Virus infected messages get tagged and then if the user has enabled 
quarantine THEN it gets delivered into quarantine but not FORCED to be 
delivered into quarantine.


That's what I meant; to treat them the same way as viruses. If quarantine is active, quarantine them (just in case someone wants to inspect them). If not, just tag the message as Invalid in webui and discard the message.

No tokenizing, just put in quarantine and tagged.

Yes. No tokenization is done for Virus infected mails. But I am not sure about 
the forced delivery to quarantine.
They're not forced. I think.

What would be the advantage of tokenizing such corrupt messages?

I don't see a big benefit (if at all) in tokenizing such a corrupt message. BUT 
I don't like the error one get's when such a message is processed. I would like 
a more cleaner handling then the error. Assume one is having a MTA that (for 
what ever reason) is accepting such a corrupt message. And assume the MTA is 
processing that message with DSPAM over a pipe. The a error 22 is going to 
instruct the MTA to produce a NDR or such and this is something I would like to 
avoid (if possible).

I see. But the only way to avoid that is to return no error, right?

btw: I was not only writing about tokenizing a message. I am thinking about 
classification as well. Sure a classification needs to tokenize the message in 
order to be able to compute a result but classification does not mean that the 
tokens need to be saved. Just evaluated but not neccesairly saved.

btw2: I am already happy that we where able to reduce the amount of failures 
from those 3.2% down to 0.17% on the TREC05 corpus. I have not tested 3.8.0 to 
see how it behaves on those messages but I would say that 3.8.0 can not be much 
better then 3.9.0. I am 100% sure that 3.9.0 does a better job in parsing the 
message then 3.8.0 but that is another issue. I am more interessed to see if 
3.8.0 is having a lower failure rate then 3.9.0 with the same options. I will 
probably go ahead and install 3.8.0 on a test system and compare them to ensure 
that we are not worse with 3.9.0 then with 3.8.0.


3.2% to %0.17 -> almost 19 times more effective.
Best Regards,
Carlo Rodrigues


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to