https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8315
Bug ID: 8315 Summary: BayesStore/SQL regression when using MySQL defaults Product: Spamassassin Version: SVN Trunk (Latest Devel Version) Hardware: PC OS: Linux Status: NEW Severity: minor Priority: P2 Component: Learner Assignee: dev@spamassassin.apache.org Reporter: d...@sr71.net Target Milestone: Undefined Created attachment 6000 --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=6000&action=edit fix the SQLite concat syntax I noticed that my bayes learning was not working well. The biggest symptom was lots of new tokens but very few hammy, neutral or spammy tokens. For instance: X-Spam-TokenSummary: Tokens: new, 177; hammy, 0; neutral, 1; spammy, 1. X-Spam-TokenSummary: Tokens: new, 178; hammy, 0; neutral, 0; spammy, 2. X-Spam-TokenSummary: Tokens: new, 104; hammy, 1; neutral, 1; spammy, 0. I've been using the following configuration: bayes_store_module Mail::SpamAssassin::BayesStore::SQL for a loooooooong time, probably 10+ years. A change in 2022[1] changed the default SQL syntax which uses "||" as a string concatenation operator. That's evidently fine in SQLite, but not in MySQL by default[2]. As a result, the generated SQL ended up with a boolean value instead of a string for the token: MariaDB [spamassassin]> SELECT SUBSTR(token || ' ', 1, 5), spam_count, ham_count, atime from bayes_token limit 10; +--------------------------------+------------+-----------+------------+ | SUBSTR(token || ' ', 1, 5) | spam_count | ham_count | atime | +--------------------------------+------------+-----------+------------+ | 0 | 0 | 1 | 1696434003 | | 0 | 0 | 3 | 1696434018 | | 0 | 0 | 6 | 1696441099 | | 0 | 0 | 1 | 1696434008 | | 0 | 0 | 2 | 1696440870 | | 0 | 0 | 3 | 1696440394 | | 0 | 0 | 1 | 1696434011 | | 0 | 0 | 2 | 1696445725 | | 0 | 0 | 1 | 1696441419 | | 0 | 0 | 1 | 1696433986 | +--------------------------------+------------+-----------+------------+ Basically, the token was either 0 or 1. Then this loop in SpamAssassin/Plugin/Bayes.pm: foreach my $tokendata (@{$tokensdata}) { ... my ($token, $tok_spam, $tok_ham, $atime) = @{$tokendata}; $pw{$token} = {... } Would only see $token as "0" or "1" and the hashing would ensure that there were only at *MOST* two tokens which explains the low token counts I see coming out of the database. The issue can be worked around by using: bayes_store_module Mail::SpamAssassin::BayesStore::MySQL but I think it should probably be fixed in case other folks are using plain "SQL", not "MySQL". A totally untested patch is attached. 1. https://svn.apache.org/viewvc?view=revision&revision=1899738 2. https://dev.mysql.com/doc/refman/8.4/en/sql-mode.html#sqlmode_pipes_as_concat -- You are receiving this mail because: You are the assignee for the bug.