https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6624

             Bug #: 6624
           Summary: BayesStore/MySQL.pm fails to update tokens due to
                    MySQL server bug (wrong count of rows affected)
           Product: Spamassassin
           Version: 3.3.2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: Libraries
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


Dave Wreski reported on the users ML on 2011-06-21:

> I have an existing v3.3.2 on fedora14 (perl v5.12.3) that I'm trying to 
> convert bayes to use mysql. The restore process fails after a few 
> minutes due to too many errors:
>   dbg: bayes: error inserting token for line: t 1 0 1308114254 4fd2b3f2f0
>   dbg: bayes: _put_token: Updated an unexpected number of rows.
>   bayes: encountered too many errors (20) while parsing token line, 
>     reverting to empty database and exiting
> mysql  Ver 14.14 Distrib 5.1.56, for redhat-linux-gnu (x86_64)

Further discussion and debugging improvements revealed:
  dbg: bayes: _put_token: Updated an unexpected number of rows: 3,
    id: 3, token: ....

As it turns out this is a MySQL server bug, or at least an
undocumented change. Googling shows that others have already stumbled
across this bug here INSERT ... ON DUPLICATE KEY UPDATE returns 3
instead of 2 as a rows-changed count. Apparently the bug has not yet
been resolved (tried it with MySQL 5.5.13) and seems to be forgotten:

  http://bugs.mysql.com/bug.php?id=46675
  http://dev.mysql.com/doc/refman/5.5/en/mysql-affected-rows.html

The relevant piece of information from 46675 is:

> [25 Aug 2009 17:57] Paul DuBois
> 
> Aside from the issue noted by Mark that part of the Connector/J doc info
> isn't getting into the manual, I think this is actually a server bug.
> Left unexplained by any of the preceding discussion is why there should
> be a server version difference (5.0 returns 2, 5.1 returns 3). I created
> a similar test program using Perl DBI, which has a mysql_client_found_rows
> flag that can be enabled or disabled at connect time, and here is what
> I find when executing the INSERT ... ON DUPLICATE KEY UPDATE statement
> and checking the rows return count.
> 
> mysql_client_found_rows = 0: The second INSERT returns a row count of 2
> in all MySQL versions.
> 
> mysql_client_found_rows = 1: The second INSERT returns this row count:
> 
> Before MySQL 5.1.20: 2
> MySQL 5.1.20: undef on Mac OS X, 139775481 on Linux
>   (initialized value? garbage?)
> MySQL 5.1.21 and up: 3
> 
> Looking in the 5.1.20 changelog, I see Bug#28505 which concerns
> mysql_affected_rows() and CLIENT_FOUND_ROWS. However, this change was
> supposed to have been made in both 5.0.44 and 5.1.20, and the change
> in row count to return 3 occurs only in 5.1. (I checked 5.0.43,
> 5.0.44, 5.0.45 and all of them return 2 rows, expected.)
> 
> It looks to me like something went wrong with the 5.1 fix. I don't know
> why there was a change from returning undef/139775481 to returning 3
> between 5.1.20 and 5.1.21. I don't see anything that looks like it's
> relevant in the 5.1.21 changelog.


The effect of the bug with SpamAssassin is that tokens are only able
to be inserted once, but their counts cannot increase, leading to
terrible bayes results if the bug is not noticed. Also the conversion
form db fails, as reported by Dave.

Attached is a patch for lib/Mail/SpamAssassin/BayesStore/MySQL.pm to
provide a workaround for the MySQL server bug, and improved debug logging.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to