Rob, could you open that as a bug?
Rob Mueller writes:
> We're using some custom code to call the spamassassin bayes learning
> routines rather than sa-learn, and have come across a slight bug that meant
> the first messages being learnt were always being lost.
>
> Going through the code here's what I have & what's happening...
>
> -----
>
> $SpamAssassin = Mail::SpamAssassin->new({
> dont_copy_prefs => 1,
> config_text => <<'EOF',
> use_bayes 0
> bayes_file_mode 0777
> bayes_learn_to_journal 1
> bayes_auto_expire 1
> bayes_journal_max_size 0
> bayes_use_hapaxes 0
> bayes_auto_learn 0
> EOF
> });
>
> $SpamAssassin->init();
>
> -----
>
> ... later on when I need to learn some messages for a user ...
>
> -----
>
> # Nasty poke straight into conf object...
> $SpamAssassin->{conf}->{use_bayes} = 1;
> $SpamAssassin->{conf}->{bayes_path} = $LearnDir . "/bayes";
> $SpamAssassin->signal_user_changed({ username => $UnqName });
> $SpamAssassin->init_learner({
> caller_will_untie => 1,
> });
>
> -----
>
> ... get the messages and loop over ...
>
> -----
>
> for-all-messages ... {
> my $Message = $SpamAssassin->parse([EMAIL PROTECTED]);
> my $Learner = $SpamAssassin->learn($Message, undef, $IsSpam);
> $Learner->finish();
> $Message->finish();
> }
> ...
>
> $SpamAssassin->rebuild_learner_caches();
> $SpamAssassin->finish_learner();
>
> -----
>
> The problem occurs because Mail::SpamAssassin::Bayes::learn() has the code:
>
> -----
> if ($self->{main}->{learn_to_journal}) {
> # If we're going to learn to journal, we'll try going r/o first...
> # If that fails for some reason, let's try going r/w. This happens
> # if the DB doesn't exist yet.
> $ok = $self->{store}->tie_db_readonly() ||
> $self->{store}->tie_db_writable();
> } else {
> $ok = $self->{store}->tie_db_writable();
> }
> -----
>
> If no bayes db exists, then
> Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly fails, so it does
> Mail::SpamAssassin::BayesStore::DBM::tie_db_writeable() which succeeds.
>
> However when it tries to learn the second message, it calls
> Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly() again. This closes
> the db for writable access, and succeeds in opening it for readonly access,
> however it doesn't clear the $self->{is_locked} flag.
>
> This means that it thinks the db is still in writable mode, even though it's
> now been tied in read_only mode.
>
> This means that every message learnt attempts to re-tie the db (slow). It
> also means that when we rebuild_learner_caches(), it thinks that the db is
> in writeable mode, so when it calls
> Mail::SpamAssassin::BayesStore::DBM::tie_db_writeable(), it immediately
> returns because it thinks the db is writeable, but then all attempts to
> write to the db are just lost (no error messages, the write to the hash just
> disappears).
>
> It seems the fix is as easy as adding $self->{is_locked} = 0 somewhere in
> tie_db_readonly()
>
> I worked around this by forcibly opening the db in write mode on immediately
> after calling init_learner.
>
> -----
> caller_will_untie => 1,
> });
>
> + # Need to force creation of db
> + my $Scanner = $SpamAssassin->{bayes_scanner};
> + if ($Scanner && $Scanner->{store}) {
> + $Scanner->{store}->tie_db_writable();
> + $Scanner->{store}->untie_db();
> + }
> +
> return 1;
> }
>
> -----
>
> Hmmm, there was another problem with locks being left around as well, but I
> can't remember the exact details again now. I just know that I found a work
> around by changing my code to:
>
> -----
>
> + # Call this first to untie db to release lock
> + $SpamAssassin->finish_learner();
> $SpamAssassin->rebuild_learner_caches();
>
> -----
>
> I'll see if I can track it down again and remind myself what it was...
>
> Rob