On 16.08.2012 19:50, Christophe Garault wrote:
Hello guys,

Hello Christophe,


I'm still in the process of installing DSPAM with the Postgresql backend.
Yesterday I ran dspam_train with several thousands of spam and ham:

Spam AND Ham? Really?

    dspam # dspam_stats
    christo...@garault.org TP:     0 TN:  5139 FP:     0 FN:  4868 SC:
    0 NC:     0


So you have here 5'139 messages that got classified as HAM ([T]rue [N]egative) and you got 4'868 messages that got falsely classified as HAM ([F]alse [N]egative). Somehow this is very, very, very, very strange. How can you make DSPAM to have just TN and FN count after almost processing 10K messages and no singe TP, FN?

Can I make a guess? You are using sbph as Tokenizer.

Something is fishy on your setup. Can you please post your dspam.conf?


I have now more than 4 million lines in dspam_token_data for this user (me).

This is a lot. Just for slightly 10K messages?


Today a spam was delivered as an innocent message. I tried to 'retrain' it by sending it to the spam alias but didn't receive any aknowledgement despite the text I wrote in /var/spool/dspam/txt/msgtag.spam
Also the userPref (spamAction) is 'tag'.
Here are my questions:

Is there any reason why I haven't received this retrained message with the associated tag in Subject ([SPAM])? How do I know that Dspam will treat similar messages as spam while there's no column in dspam_signature_data that classifies the message ?
Does the tokens of this particular messages were changed instead ?
And where are those tokens stored (I mean in clear readable format)? Is it the data column of dspam_signature_data ? I may well have a problem with that because the database encoding is UTF8 and while dspam_train was running I had my logs full of errors like this one:

    2012-08-15 17:57:01 CEST ERREUR:  séquence d'octets invalide pour
    l'encodage « UTF8 » : 0xe93634     (wich means invalid sequence of
    bytes)
    2012-08-15 17:57:01 CEST INSTRUCTION :  INSERT INTO
    dspam_signature_data (uid,signature,length,created_on,data) VALUES
    (1,E'502bc6cd31541794952939',2304,CURRENT_DATE,E'\xe964b99787[etc.....]

What version of DSPAM is that?

To make things worse, I've just ran dspam on the command line within my Maildir directory but without more success:

    dspam --user christo...@garault.org --class=spam --source=error
    --deliver=spam  < spammessage.txt

No message, nothing...

And to finish this long post (I Apologize) there's a typo into this spam: it says 'émail' instead of 'email' which is not a frequent mistake. So if I look for that token with the dump tool, this is what I get:

    dspam # dspam_dump christo...@garault.org émail
    5842976176687468544  S: 00000  I: 00000  P: 0.4000

Doesn't that mean the token was not updated by my previous commands ?



Thanks in advance for all those willing to help.

Christophe Garault


!DSPAM:502d330f214748549915296!


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user


--
 Kind Regards from Switzerland,

 Stevan Bajić

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to