Rob, could you open that as a bug?

Rob Mueller writes:
> We're using some custom code to call the spamassassin bayes learning 
> routines rather than sa-learn, and have come across a slight bug that meant 
> the first messages being learnt were always being lost.
> 
> Going through the code here's what I have & what's happening...
> 
> -----
> 
>     $SpamAssassin = Mail::SpamAssassin->new({
>       dont_copy_prefs => 1,
>       config_text => <<'EOF',
> use_bayes 0
> bayes_file_mode 0777
> bayes_learn_to_journal 1
> bayes_auto_expire 1
> bayes_journal_max_size 0
> bayes_use_hapaxes 0
> bayes_auto_learn 0
> EOF
>     });
> 
>     $SpamAssassin->init();
> 
> -----
> 
> ... later on when I need to learn some messages for a user ...
> 
> -----
> 
>   # Nasty poke straight into conf object...
>   $SpamAssassin->{conf}->{use_bayes} = 1;
>   $SpamAssassin->{conf}->{bayes_path} = $LearnDir . "/bayes";
>   $SpamAssassin->signal_user_changed({ username => $UnqName });
>   $SpamAssassin->init_learner({
>     caller_will_untie => 1,
>   });
> 
> -----
> 
> ... get the messages and loop over ...
> 
> -----
> 
>     for-all-messages ... {
>       my $Message = $SpamAssassin->parse([EMAIL PROTECTED]);
>       my $Learner = $SpamAssassin->learn($Message, undef, $IsSpam);
>       $Learner->finish();
>       $Message->finish();
>     }
> ...
> 
>   $SpamAssassin->rebuild_learner_caches();
>   $SpamAssassin->finish_learner();
> 
> -----
> 
> The problem occurs because Mail::SpamAssassin::Bayes::learn() has the code:
> 
> -----
>     if ($self->{main}->{learn_to_journal}) {
>       # If we're going to learn to journal, we'll try going r/o first...
>       # If that fails for some reason, let's try going r/w.  This happens
>       # if the DB doesn't exist yet.
>       $ok = $self->{store}->tie_db_readonly() ||
> $self->{store}->tie_db_writable();
>     } else {
>       $ok = $self->{store}->tie_db_writable();
>     }
> -----
> 
> If no bayes db exists, then
> Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly fails, so it does
> Mail::SpamAssassin::BayesStore::DBM::tie_db_writeable() which succeeds.
> 
> However when it tries to learn the second message, it calls 
> Mail::SpamAssassin::BayesStore::DBM::tie_db_readonly() again. This closes 
> the db for writable access, and succeeds in opening it for readonly access, 
> however it doesn't clear the $self->{is_locked} flag.
> 
> This means that it thinks the db is still in writable mode, even though it's
> now been tied in read_only mode.
> 
> This means that every message learnt attempts to re-tie the db (slow). It
> also means that when we rebuild_learner_caches(), it thinks that the db is
> in writeable mode, so when it calls
> Mail::SpamAssassin::BayesStore::DBM::tie_db_writeable(), it immediately
> returns because it thinks the db is writeable, but then all attempts to
> write to the db are just lost (no error messages, the write to the hash just
> disappears).
> 
> It seems the fix is as easy as adding $self->{is_locked} = 0 somewhere in
> tie_db_readonly()
> 
> I worked around this by forcibly opening the db in write mode on immediately 
> after calling init_learner.
> 
> -----
>      caller_will_untie => 1,
>    });
> 
> +  # Need to force creation of db
> +  my $Scanner = $SpamAssassin->{bayes_scanner};
> +  if ($Scanner && $Scanner->{store}) {
> +    $Scanner->{store}->tie_db_writable();
> +    $Scanner->{store}->untie_db();
> +  }
> +
>    return 1;
>  }
> 
> -----
> 
> Hmmm, there was another problem with locks being left around as well, but I 
> can't remember the exact details again now. I just know that I found a work 
> around by changing my code to:
> 
> -----
> 
> +  # Call this first to untie db to release lock
> +  $SpamAssassin->finish_learner();
>    $SpamAssassin->rebuild_learner_caches();
> 
> -----
> 
> I'll see if I can track it down again and remind myself what it was...
> 
> Rob

Reply via email to