The longer story is in here:

        https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8315

But the short version is: I have a spamassassin config that is a bit
old. I'd say I originally wrote it in ~2005. It's been happily using:

bayes_store_module           Mail::SpamAssassin::BayesStore::SQL

for many, many years with both Postgres and MySQL. But, after a recent
Debian upgrade, bayes filtering broke because BayesStore::SQL stopped
working for MySQL. The failure mode is really insidious because the
tokens go in the database just fine and spam/ham classification
_happens_, just very badly.

The Mail::SpamAssassin::BayesStore::SQL manpage has been updated to
say not to use BayesStore::SQL with MySQL, which is great. But I
suspect that won't reach many existing users.  sql/README.bayes also
talks about BayesStore::MySQL as being a tiny optimization over
BayesStore::SQL and doesn't talk about this landmine. So the docs are
inconsistent at best.

Considering that my config worked and worked for a *LONG* time, could
we add a little check to the BayesStore::SQL code to make it fail more
quickly and gracefully?

The attached patch looks at the DBI handle and checks for "mysql" and
"Pg" databases. If it sees one, it refuses to proceed and spits an
error into the debug log. This should be a lot easier on users because
the failure is much more obvious and explicit.

I also updated the bayes README just to make it clear that you *must*
use the right module for your database. It's not just a "small boost
in performance".

If this is something folks are interested merging, I'd appreciate any
guidance for how best to submit these changes. I'm happy to send along
individual patches or open up Bugzilla entries for these. Just let me
know what's easiest on you folks.
diff --git a/lib/Mail/SpamAssassin/BayesStore/SQL.pm b/lib/Mail/SpamAssassin/BayesStore/SQL.pm
index 16e4e787d..f14250d9a 100644
--- a/lib/Mail/SpamAssassin/BayesStore/SQL.pm
+++ b/lib/Mail/SpamAssassin/BayesStore/SQL.pm
@@ -1676,6 +1676,15 @@ sub _connect_db {
     dbg("bayes: database connection established");
   }
 
+  # This module is no longer compatible with MySQL or Postgres. But
+  # some users have previously-working old configs.  Warn them.
+  my $name = $dbh->{Driver}->{Name};
+  if ($name =~ /^mysql$/i ||
+      $name =~ /^Pg$/i) {
+    dbg("bayes: error: attempt to use generic SQL module for $name");
+    return 0;
+  }
+
   # SQLite PRAGMA attributes - here for tests, see bug 8033
   if ($self->{_dsn} =~ /^dbi:SQLite:.*?;(.+)/i) {
     foreach my $attr (split(/;/, $1)) {
diff --git a/sql/README.bayes b/sql/README.bayes
index 5cce678c7..cf619a774 100644
--- a/sql/README.bayes
+++ b/sql/README.bayes
@@ -2,51 +2,32 @@
 Using A SQL Database for Bayesian Storage Module
 -------------------------------------------------------
 
-SpamAssassin can now store users' bayesian filter data in a SQL
-database.  The most common use for a system like this would be for
-users to be able to have per user bayesian filter data on systems
-where users may not have a home directory to store the data.
+SpamAssassin can store users' bayesian filter data in a SQL database.
+The most common use for a system like this would be for users to be
+able to have per user bayesian filter data on systems where users may
+not have a home directory to store the data.
 
 In order to activate the SQL based bayesian storage you have to
 configure spamassassin and spamd to use a different bayes storage
 module.  This can be done via a setting in the global configuration
-file.
+file based on the type of database being used.
 
-The directives required to turn on the SQL based bayesian storage are:
+For SQLite (or possibly other standard SQL servers):
 
 bayes_store_module		   Mail::SpamAssassin::BayesStore::SQL
 
-This directive is used by the Bayes module to determine which storage
-module should be used. If not set it will default to:
-Mail::SpamAssassin::BayesStore::DBM
-
-The storage module Mail::SpamAssassin::BayesStore::SQL is an older generic
-SQL module which can be also be used with versions of MySQL which did not
-have support for an InnoDB engine and transactions. If choosing this module
-consider replacing the InnoDB engine with MyISAM (explicitly or defaulted)
-in the schema (files bayes_mysql.sql and awl_mysql.sql). Note that old
-versions of MySQL expect syntax TYPE=MyISAM instead of ENGINE=MyISAM,
-while newer versions throw a syntax error on TYPE and only allow ENGINE.
-In short: replace ENGINE=InnoDB with TYPE=MyISAM (or just leave it out)
-in the bayes_mysql.sql and awl_mysql.sql schemas if ENGINE=InnoDB is not
-accepted.
-
-There is also a MySQL specific storage driver available to provides a
-small boost in performance.  It requires version 4.1 or above of the
-MySQL database software to work properly.  In addition, it provides
-rollback on error functionality if you create your bayes database table
-using the InnoDB storage engine. WARNING: Using this module with a version
-of MySQL < 4.1 could have unexpected results.  To use the MySQL 4.1+
-specific module set your bayes_store_module directive accordingly:
-  bayes_store_module               Mail::SpamAssassin::BayesStore::MySQL
-
-PostgreSQL users will want to use the PostgreSQL specific storage
-module:
-  bayes_store_module               Mail::SpamAssassin::BayesStore::PgSQL
-This module provides a slightly different interface to makes better
-use of the resources that PostgreSQL offers.  In addition, please make
-sure that you follow the instructions below for loading the proper
-procedural language and installing the tables and stored procedure.
+For MySQL/MariaDB or PgSQL, you must use a specific directive for the
+database type. They are not compatible with the "SQL" module:
+
+bayes_store_module		   Mail::SpamAssassin::BayesStore::MySQL
+bayes_store_module		   Mail::SpamAssassin::BayesStore::PgSQL
+
+If you are using an old configuration with MySQL and see errors with
+ENGINE=InnoDB, replace it with TYPE=MyISAM (or just leave it out).
+
+For PgSQL, please make sure that you follow the instructions below for
+loading the proper procedural language and installing the tables and
+stored procedure.
 
 Additional configuration directives provided by BayesSQL:
 

Reply via email to