Stevan Bajić wrote:
Hello, Stevan.
Hallo Carlo,
I would say that invalid messages such as those you spoke should be put
into quarantine and tagged as 'Invalid' or 'Corrupt',
That could be a problem. I mean in that term that the operator of DSPAM could
have decided to NOT use quarantine functionality and if we enforce the delivery
to quarantine then we break that rule. And we would blow up the quarantine for
nothing if someone is doing a large scale training where such messages could be
in the training set and a forced delivery to quarantine is (probably) not what
the trainer is looking for.
No, I didn't mean to 'force' them into quarantine. Only to deliver them
into quarantine, _if quarantine is being used_.
And during training (dspam_train), quarantine is not used, anyway.
So during training, I would just ignore the message, write "Corrupt
message" or something, and move on to the next one.
so the user can
decide to receive them later. In fact, this is how dspam handles
viruses, right?
I am not sure. It has been some time since I used Anti-Virus inside DSPAM. Does
a infected message really get delivered into quarantine? I had the impression
that Virus infected messages get tagged and then if the user has enabled
quarantine THEN it gets delivered into quarantine but not FORCED to be
delivered into quarantine.
That's what I meant; to treat them the same way as viruses. If
quarantine is active, quarantine them (just in case someone wants to
inspect them). If not, just tag the message as Invalid in webui and
discard the message.
No tokenizing, just put in quarantine and tagged.
Yes. No tokenization is done for Virus infected mails. But I am not sure about
the forced delivery to quarantine.
They're not forced. I think.
What
would be the advantage of tokenizing such corrupt messages?
I don't see a big benefit (if at all) in tokenizing such a corrupt message. BUT
I don't like the error one get's when such a message is processed. I would like
a more cleaner handling then the error. Assume one is having a MTA that (for
what ever reason) is accepting such a corrupt message. And assume the MTA is
processing that message with DSPAM over a pipe. The a error 22 is going to
instruct the MTA to produce a NDR or such and this is something I would like to
avoid (if possible).
I see. But the only way to avoid that is to return no error, right?
btw: I was not only writing about tokenizing a message. I am thinking about
classification as well. Sure a classification needs to tokenize the message in
order to be able to compute a result but classification does not mean that the
tokens need to be saved. Just evaluated but not neccesairly saved.
btw2: I am already happy that we where able to reduce the amount of failures
from those 3.2% down to 0.17% on the TREC05 corpus. I have not tested 3.8.0 to
see how it behaves on those messages but I would say that 3.8.0 can not be much
better then 3.9.0. I am 100% sure that 3.9.0 does a better job in parsing the
message then 3.8.0 but that is another issue. I am more interessed to see if
3.8.0 is having a lower failure rate then 3.9.0 with the same options. I will
probably go ahead and install 3.8.0 on a test system and compare them to ensure
that we are not worse with 3.9.0 then with 3.8.0.
3.2% to %0.17 -> almost 19 times more effective.
Best Regards,
Carlo Rodrigues
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel