Hello list I need help from PostgreSQL and from SQLite gurus. I have invested some time in making the MySQL storage driver for DSPAM 3.8.0 more solid and enhanced. Currently all of the SQL based storage drivers are full of small but nasty errors. They all fail when you set in DSPAM the MaxMessageSize high enough and train with a huge mail.
To illustrate the problem: 1) Increase your MaxMessageSize in DSPAM (make it 10MB, 15MB or more) 2) Download http://www.cs.virginia.edu/~cs101/hws/hw6/markov/textfiles/bible.txt 3) Let DSPAM process and tag the text: dspam --user <your_user> --process --deliver=summary --stdout < /path/to/bible.txt The <your_user> should not be on NOTRAIN mode! The result will probably be nothing. DSPAM will not output anything. But sql.errors will probably contain a failed SQL query. And pretty sure your DSPAM running in client/server mode will be crashed (well... sort of. It will run but the connection to your storage engine will be gone for --client mode). I have fixed that issue on my 3.8.0 installation with MySQL. When I train with the patched DSPAM 3.8.0 then I get this: mail / # dspam --user globaluser --process --deliver=summary --stdout < /tmp/bible.txt X-DSPAM-Result: globaluser; result="Innocent"; class="Innocent"; probability=0.0000; confidence=1.00; signature=1,4755c2c538801608415579 mail / # Checking for signature data I get this: mail / # mysql --user=$(sed -n "3,1p" /etc/mail/dspam/mysql.data) --password=$(sed -n "4,1p" /etc/mail/dspam/mysql.data) --socket=$(sed -n "1,1p" /etc/mail/dspam/mysql.data) -e "select uid,signature,octet_length(data),length,created_on from dspam_signature_data where signature='1,4755c2c538801608415579'" $(sed -n "5,1p" /etc/mail/dspam/mysql.data) +-----+--------------------------+--------------------+---------+------------+ | uid | signature | octet_length(data) | length | created_on | +-----+--------------------------+--------------------+---------+------------+ | 1 | 1,4755c2c538801608415579 | 8804556 | 8804556 | 2007-12-04 | +-----+--------------------------+--------------------+---------+------------+ mail / # With MySQL most people will get a length of 32767 since in the source a signed long is used for filling length in MySQL. The bytes for data will probably be 65535 which is the maximum for a BLOB type field in MySQL. I have changed a lot of code in DSPAM 3.8.0 source to get the above functionality. My problem now is that most of the changes are in the mysql_drv.c source file but some part is outside that file and those changes would affect any other storage engine. So for this change to be useful I need to change the other storage drivers as well. I know PostgreSQL and SQLite but not so good as I do MySQL. So my question or request for help is: Is any one here in the list willing to help me to get the other storage engines working the proper way? You don't need to be a c coder (would help but not required). I just need some one I can ask about PostgreSQL and/or SQLite if I have storage specific questions. Any one willing to help? I know that no one is so crazy to train 766'111 words with an anti spam filter. Especially not with anything more complex then unigram (I used noise with osb with burton graham naive and bcr). But to know that it is possible with DSPAM makes me more confident to use DSPAM. // SteveB -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
