On 16.08.2012 19:50, Christophe Garault wrote:
Hello guys,
Hello Christophe,
I'm still in the process of installing DSPAM with the Postgresql backend.
Yesterday I ran dspam_train with several thousands of spam and ham:
Spam AND Ham? Really?
dspam # dspam_stats
christo...@garault.org TP: 0 TN: 5139 FP: 0 FN: 4868 SC:
0 NC: 0
So you have here 5'139 messages that got classified as HAM ([T]rue
[N]egative) and you got 4'868 messages that got falsely classified as
HAM ([F]alse [N]egative). Somehow this is very, very, very, very
strange. How can you make DSPAM to have just TN and FN count after
almost processing 10K messages and no singe TP, FN?
Can I make a guess? You are using sbph as Tokenizer.
Something is fishy on your setup. Can you please post your dspam.conf?
I have now more than 4 million lines in dspam_token_data for this user
(me).
This is a lot. Just for slightly 10K messages?
Today a spam was delivered as an innocent message. I tried to
'retrain' it by sending it to the spam alias but didn't receive any
aknowledgement despite the text I wrote in
/var/spool/dspam/txt/msgtag.spam
Also the userPref (spamAction) is 'tag'.
Here are my questions:
Is there any reason why I haven't received this retrained message with
the associated tag in Subject ([SPAM])?
How do I know that Dspam will treat similar messages as spam while
there's no column in dspam_signature_data that classifies the message ?
Does the tokens of this particular messages were changed instead ?
And where are those tokens stored (I mean in clear readable format)?
Is it the data column of dspam_signature_data ? I may well have a
problem with that because the database encoding is UTF8 and while
dspam_train was running I had my logs full of errors like this one:
2012-08-15 17:57:01 CEST ERREUR: séquence d'octets invalide pour
l'encodage « UTF8 » : 0xe93634 (wich means invalid sequence of
bytes)
2012-08-15 17:57:01 CEST INSTRUCTION : INSERT INTO
dspam_signature_data (uid,signature,length,created_on,data) VALUES
(1,E'502bc6cd31541794952939',2304,CURRENT_DATE,E'\xe964b99787[etc.....]
What version of DSPAM is that?
To make things worse, I've just ran dspam on the command line within
my Maildir directory but without more success:
dspam --user christo...@garault.org --class=spam --source=error
--deliver=spam < spammessage.txt
No message, nothing...
And to finish this long post (I Apologize) there's a typo into this
spam: it says 'émail' instead of 'email' which is not a frequent
mistake. So if I look for that token with the dump tool, this is what
I get:
dspam # dspam_dump christo...@garault.org émail
5842976176687468544 S: 00000 I: 00000 P: 0.4000
Doesn't that mean the token was not updated by my previous commands ?
Thanks in advance for all those willing to help.
Christophe Garault
!DSPAM:502d330f214748549915296!
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user
--
Kind Regards from Switzerland,
Stevan Bajić
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user