Hello list

I need help from PostgreSQL and from SQLite gurus. I have invested some time in 
making the MySQL storage driver for DSPAM 3.8.0 more solid and enhanced. 
Currently all of the SQL based storage drivers are full of small but nasty 
errors. They all fail when you set in DSPAM the MaxMessageSize high enough and 
train with a huge mail.

To illustrate the problem:
1) Increase your MaxMessageSize in DSPAM (make it 10MB, 15MB or more)
2) Download http://www.cs.virginia.edu/~cs101/hws/hw6/markov/textfiles/bible.txt
3) Let DSPAM process and tag the text:
   dspam --user <your_user> --process --deliver=summary --stdout <
/path/to/bible.txt

The <your_user> should not be on NOTRAIN mode!

The result will probably be nothing. DSPAM will not output anything. But 
sql.errors will probably contain a failed SQL query. And pretty sure your DSPAM 
running in client/server mode will be crashed (well... sort of. It will run but 
the connection to your storage engine will be gone for --client mode).

I have fixed that issue on my 3.8.0 installation with MySQL. When I train with 
the patched DSPAM 3.8.0 then I get this:
mail / # dspam --user globaluser --process --deliver=summary --stdout <
/tmp/bible.txt
X-DSPAM-Result: globaluser; result="Innocent"; class="Innocent";
probability=0.0000; confidence=1.00; signature=1,4755c2c538801608415579
mail / #

Checking for signature data I get this:
mail / # mysql --user=$(sed -n "3,1p" /etc/mail/dspam/mysql.data)
--password=$(sed -n "4,1p" /etc/mail/dspam/mysql.data) --socket=$(sed -n "1,1p"
/etc/mail/dspam/mysql.data) -e "select
uid,signature,octet_length(data),length,created_on from dspam_signature_data
where signature='1,4755c2c538801608415579'" $(sed -n "5,1p"
/etc/mail/dspam/mysql.data)
+-----+--------------------------+--------------------+---------+------------+
| uid | signature                | octet_length(data) | length  | created_on |
+-----+--------------------------+--------------------+---------+------------+
|   1 | 1,4755c2c538801608415579 |            8804556 | 8804556 | 2007-12-04 |
+-----+--------------------------+--------------------+---------+------------+
mail / #


With MySQL most people will get a length of 32767 since in the source a signed 
long is used for filling length in MySQL. The bytes for data will probably be 
65535 which is the maximum for a BLOB type field in MySQL.

I have changed a lot of code in DSPAM 3.8.0 source to get the above 
functionality. My problem now is that most of the changes are in the 
mysql_drv.c source file but some part is outside that file and those changes 
would affect any other storage engine. So for this change to be useful I need 
to change the other storage drivers as well.

I know PostgreSQL and SQLite but not so good as I do MySQL. So my question or 
request for help is: Is any one here in the list willing to help me to get the 
other storage engines working the proper way? You don't need to be a c coder 
(would help but not required). I just need some one I can ask about PostgreSQL 
and/or SQLite if I have storage specific questions.

Any one willing to help?


I know that no one is so crazy to train 766'111 words with an anti spam filter. 
Especially not with anything more complex then unigram (I used noise with osb 
with burton graham naive and bcr). But to know that it is possible with DSPAM 
makes me more confident to use DSPAM.


// SteveB
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

Reply via email to