We're using some custom code to call the spamassassin bayes learning
routines rather than sa-learn, and have come across a slight bug that meant
the first messages being learnt were always being lost.
Going through the code here's what I have & what's happening...
-----
$SpamAssassin = Mail::SpamAssassin->new({
dont_copy_prefs => 1,
config_text => <<'EOF',
use_bayes 0
bayes_file_mode 0777
bayes_learn_to_journal 1
bayes_auto_expire 1
bayes_journal_max_size 0
bayes_use_hapaxes 0
bayes_auto_learn 0
EOF
});
$SpamAssassin->init();
-----
... later on when I need to learn some messages for a user ...
-----
# Nasty poke straight into conf object...
$SpamAssassin->{conf}->{use_bayes} = 1;
$SpamAssassin->{conf}->{bayes_path} = $LearnDir . "/bayes";
$SpamAssassin->signal_user_changed({ username => $UnqName });
$SpamAssassin->init_learner({
caller_will_untie => 1,
});
-----
... get the messages and loop over ...
-----
for-all-messages ... {
my $Message = $SpamAssassin->parse([EMAIL PROTECTED]);
my $Learner = $SpamAssassin->learn($Message, undef, $IsSpam);
$Learner->finish();
$Message->finish();
}
...
$SpamAssassin->rebuild_learner_caches();
$SpamAssassin->finish_learner();
-----
The problem occurs because Mail::SpamAssassin::Bayes::learn() has the code:
-----
if ($self->{main}->{learn_to_journal}) {
# If we're going to learn to journal, we'll try going r/o first...
# If that fails for some reason, let's try going r/w. This happens
# if the DB doesn't exist yet.
$ok = $self->{store}->tie_db_readonly() ||
$self->{store}->tie_db_writable();
} else {
$ok = $self->{store}->tie_db_writable();
}
-----
If no bayes db exists, then
Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly fails, so it does
Mail::SpamAssassin::BayesStore::DBM::tie_db_writeable() which succeeds.
However when it tries to learn the second message, it calls
Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly() again. This closes
the db for writable access, and succeeds in opening it for readonly access,
however it doesn't clear the $self->{is_locked} flag.
This means that it thinks the db is still in writable mode, even though it's
now been tied in read_only mode.
This means that every message learnt attempts to re-tie the db (slow). It
also means that when we rebuild_learner_caches(), it thinks that the db is
in writeable mode, so when it calls
Mail::SpamAssassin::BayesStore::DBM::tie_db_writeable(), it immediately
returns because it thinks the db is writeable, but then all attempts to
write to the db are just lost (no error messages, the write to the hash just
disappears).
It seems the fix is as easy as adding $self->{is_locked} = 0 somewhere in
tie_db_readonly()
I worked around this by forcibly opening the db in write mode on immediately
after calling init_learner.
-----
caller_will_untie => 1,
});
+ # Need to force creation of db
+ my $Scanner = $SpamAssassin->{bayes_scanner};
+ if ($Scanner && $Scanner->{store}) {
+ $Scanner->{store}->tie_db_writable();
+ $Scanner->{store}->untie_db();
+ }
+
return 1;
}
-----
Hmmm, there was another problem with locks being left around as well, but I
can't remember the exact details again now. I just know that I found a work
around by changing my code to:
-----
+ # Call this first to untie db to release lock
+ $SpamAssassin->finish_learner();
$SpamAssassin->rebuild_learner_caches();
-----
I'll see if I can track it down again and remind myself what it was...
Rob