postfix-users  

Re: PATCH: bogus Berkeley DB warnings (was: smtpd crashes)

Wietse Venema
Sat, 02 Jan 2010 17:25:21 -0800

Wietse Venema:
> Ralf Hildebrandt:
> > * Wietse Venema <wie...@porcupine.org>:
> > > >Jan  1 20:19:41 mail-ausfall postfix/verify[26329]: fatal: close 
> > > >database /var/lib/postfix/verify.db: No such file or directory
> > > 
> > > Does not reproduce on Ubuntu 9.10-server with the default Berkeley DB 4.7.
> > > 
> > > Can you check if this warning (and the warning for postscreen) goes
> > > away when automatic cache cleanup is turned off?
> > > 
> > > address_verify_cache_cleanup_interval = 0
> > > postscreen_cache_cleanup_interval = 0
> > 
> > It never occured BEFORE the automatic cache cleanup was introduced.
> 
> New errors, bogus or not, happen after a program is changed so that
> it executes code paths that it did not execute before.
> 
> I am going to take a very pragmatic decision. Having established
> that this is a bogus error, I am going to log it as a non-error.

Also released as postfix-2.7-20090102, with HISTORY file entry:

    Workaround: don't report bogus Berkeley DB close errors as
    fatal errors. All operations before close are already error
    checked, so the data is known to be safe.  File: util/dict_db.c.

Having spent the better part of today on bogus DB errors, I am now
going to spend the rest of this break on non-Postfix things.

        Wietse

> If someone can figure out how to reliably reproduce this, I am
> mildly interested.
> 
>       Wietse
> 
> *** ./dict_db.c-      Thu Jan  4 09:06:07 2007
> --- ./dict_db.c       Sat Jan  2 16:28:08 2010
> ***************
> *** 535,542 ****
>   #endif
>       if (DICT_DB_SYNC(dict_db->db, 0) < 0)
>       msg_fatal("flush database %s: %m", dict_db->dict.name);
>       if (DICT_DB_CLOSE(dict_db->db) < 0)
> !     msg_fatal("close database %s: %m", dict_db->dict.name);
>       if (dict_db->key_buf)
>       vstring_free(dict_db->key_buf);
>       if (dict_db->val_buf)
> --- 535,553 ----
>   #endif
>       if (DICT_DB_SYNC(dict_db->db, 0) < 0)
>       msg_fatal("flush database %s: %m", dict_db->dict.name);
> + 
> +     /*
> +      * With some Berkeley DB implementations, close fails with a bogus 
> ENOENT
> +      * error, while it reports no errors with put+sync, no errors with
> +      * del+sync, and no errors with the sync operation just before this
> +      * comment. This happens in programs that never fork and that never 
> share
> +      * the database with other processes. The bogus close error has been
> +      * reported for programs that use the first/next iterator. Instead of
> +      * making Postfix look bad because it reports errors that other programs
> +      * ignore, I'm going to report the bogus error as a non-error.
> +      */
>       if (DICT_DB_CLOSE(dict_db->db) < 0)
> !     msg_info("close database %s: %m", dict_db->dict.name);
>       if (dict_db->key_buf)
>       vstring_free(dict_db->key_buf);
>       if (dict_db->val_buf)
> 
>