Re: [dspam-users] won't learn?

Daniel Fisla Sat, 27 Jan 2007 18:19:29 -0800

I have a similar experience with eastern european languages. No matterhow many times I re-train in TOE mode the messagesfrom the same users always get classified as SPAM. What's worse, since Icannot re-train white-listing fails as well.

I am running in a shared group and manage to get about 98.6% accuracy rate:


               TP True Positives:           6578
               TN True Negatives:          29488
               FP False Positives:           257
               FN False Negatives:           250
               SC Spam Corpusfed:             12
               NC Nonspam Corpusfed:           1
               TL Training Left:               0
               SHR Spam Hit Rate          96.34%
               HSR Ham Strike Rate:        0.86%
               OCA Overall Accuracy:      98.61%

I know dspam tokenizes the emails and calculates probabilities, but Iknow these problems persist, the strange thing is the encodingis us-ascii just the lnaguages change from english. I know the info heredoes not help much to determine the problem, just wanted to know

if I was alone with these problems.

-Daniel.


Patrick T. Tsang wrote:

I think most people who never come across chinese don't know howchinese works.
The chinese I am talking about is BIG5,GB2312, or UTF-8 (better).
Before MS$ enterprises the whole world, I don't think we will give upGB2312 and BIG5 charset.
If you look at BIG5, and GB2312, they are using the same mappingtable, or most likely the charset occupy the same address on the table.They are all 2-bytes ASCII code sharing the same charset address butwith different encoding only.
There is no way for Dspam to see which is GB2312 and BIG5...
I cannot just spam GB2312 email since the BIG5 email will be "spamed"too.
I have seen too many cases of dspam failure to detect the correctencoding.
BTW, the most problem is the re-train process... no one here can tell...

Good luck
Patrick



----- Original Message ----- From: "Dov Zamir" <[EMAIL PROTECTED]>
To: "Patrick T. Tsang" <[EMAIL PROTECTED]>
Cc: "Kent Tong" <[EMAIL PROTECTED]>;<[email protected]>
Sent: Saturday, January 27, 2007 3:42 PM
Subject: Re: [dspam-users] won't learn?
ציטוט Patrick T. Tsang:
Hello Kent,

I have the same problem.
And, I give up Dspam already. The result is not good, and themaintenance is too difficult to deal with.
No one here can answer me the problem of re-learn...
I think Dspam got its good idea to handle spam, but it is notdesigned for chinese.
Patrick,
I don't think that is correct. DSPAM tokenizes the email, there is noconcept of language. It works just fine with Hebrew for my setup, sowhy would it not work with Chinese?
Good luck
Patrick



----- Original Message ----- From: "Kent Tong" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Saturday, January 27, 2007 11:21 AM
Subject: Re: [dspam-users] won't learn?
Marcin Krol wrote:
1. Try looking up the DSPAM factors in the message headers,
(you can view full message by pressing Ctrl-U in Thunderbird
or F9 in The Bat), the headers may give you some clue?
I just found out even for a spam correctly identified as spam, if
I classify it again, it will say it's innocent. If I delete the
headers generated by dspam (including the "Received by:" headers
it and Cyrus generated), then it will classify it as spam.

However, for a spam that wasn't identified, even after training it,
dspam is still classifying its header-removed version as innocent.
2. Have you changed the default spam-probability algorithms
in dspam.conf? You could tweak those and see what changes.
No.

--
Kent Tong
Useful news for CIO's at http://www2.cpttm.org.mo/cyberlab/cio-news
_________________________________________________________________________This message has been scanned by Kibbutz Beit Kama's Anti Virussoftware,
and is believed to be clean of any viruses.
_________________________________________________________________________
_________________________________________________________________________This message has been scanned by Kibbutz Beit Kama's Anti Virussoftware,
and is believed to be clean of any viruses.
_________________________________________________________________________



!DSPAM:8,45bb28ab81291006769230!

Re: [dspam-users] won't learn?

Reply via email to