http://bugzilla.spamassassin.org/show_bug.cgi?id=4019





------- Additional Comments From [EMAIL PROTECTED]  2004-12-06 12:17 -------
Subject: Re:  BayesSQL token column type for MySQL may end up with semi-bogus 
data

On Mon, Dec 06, 2004 at 11:35:23AM -0800, [EMAIL PROTECTED] wrote:
> A quick look at SQL.pm...
> 
> dump_db_toks() already uses SQL's RPAD() to re-pad the token.
> 

Oh yeah.  Sorry, my FIFO mind must have completely forgotten about
this, thanks for the pointer.

> backup_database() on the other hand does not do this and proceeds to unpack 
> the
> tokens without re-padding them... which will result in a different value than
> unpacking the same string with trailing space.
> 
> So, if I understand what the problem IS, changing "token" to "RPAD(token,5,' 
> ')"
> when $sql is set in backup_database() should fix the problem... unless I'm
> missing something and the problem also exists when you dump a DB (using
> dump_db_toks).

This is obviously the fix for this piece.

I'm still troubled by the following:

mysql> desc bayes_token2;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| token | char(5) |      | PRI |         |       |
+-------+---------+------+-----+---------+-------+
1 row in set (0.00 sec)

mysql> insert into bayes_token2 values ('test ');
Query OK, 1 row affected (0.00 sec)

mysql> insert into bayes_token2 values ('test1');
Query OK, 1 row affected (0.00 sec)

mysql> insert into bayes_token2 values ('blah ');
Query OK, 1 row affected (0.00 sec)

mysql> insert into bayes_token2 values ('foo  ');
Query OK, 1 row affected (0.00 sec)

mysql> select * from bayes_token2 where token = 'test ';
+-------+
| token |
+-------+
| test  |
+-------+
1 row in set (0.00 sec)

mysql> select * from bayes_token2 where token = 'test';
+-------+
| token |
+-------+
| test  |
+-------+
1 row in set (0.00 sec)

mysql> select * from bayes_token2 where token = 'blah';
+-------+
| token |
+-------+
| blah  |
+-------+
1 row in set (0.00 sec)

mysql> select * from bayes_token2 where token = 'foo';
+-------+
| token |
+-------+
| foo   |
+-------+
1 row in set (0.00 sec)

mysql> select * from bayes_token2 where token = 'foo ';
+-------+
| token |
+-------+
| foo   |
+-------+
1 row in set (0.00 sec)

mysql> select * from bayes_token2 where token = 'foo  ';
+-------+
| token |
+-------+
| foo   |
+-------+
1 row in set (0.00 sec)

mysql> select token, length(token) from bayes_token2;
+-------+---------------+
| token | length(token) |
+-------+---------------+
| blah  |             4 |
| foo   |             3 |
| test  |             4 |
| test1 |             5 |
+-------+---------------+
4 rows in set (0.00 sec)

'foo' == 'foo  ' could cause some sort of problem I think, but maybe
I'm over analyzing things.

Michael




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to