--On Tuesday, September 04, 2001 6:01 PM -0400 Lawrence Greenfield 
<[EMAIL PROTECTED]> wrote:

>    Date: Mon, 27 Aug 2001 09:38:26 -0400
>    From: Scott Adkins <[EMAIL PROTECTED]>
>
>    [ ... many duplicate deliverdb errors deleted ... ]
>
>    [ And so on, and so forth... hitting all the delivery databases ]
>    [ there is a mixture of duplicate_check and duplicate-mark entries
> also ]
>
> You definitely need to run recovery on your duplicate delivery
> database.  I might just delete the database entirely; since the
> duplicate delivery database isn't transactional, it can't guarantee
> consistency, and there's no critical database in it.

We stopped the server and nuked the delivery files altogether.  This only
worked for 2 days, however, and now we are back to where we were.  We see
a constant stream of duplicate delivery database errors...

> To run recover, you must kill at lmtpd's (and any other processes that
> might have the database open) and run ctl_deliver -r.  Just stopping
> (and waiting) and starting the master process should do this.

I was curious how we could do this without stopping our production server.
So, basically, I need to turn off our cron jobs that process the mail queues
(thus, talking to lmtp via TCP), kill off any sendmail's currently doing
queue processing, then kill off any lmtp daemons.  At that point, I can
run the ctl_deliver process.

So, does ctl_deliver actually clear out all the locks in the database as
part of the recovery operation?

> Possibly a system crash or a process crash/being killed at just the
> wrong time, due to the lack of transactions.

I don't know, but this strikes me as being an extremeley fragile system.
We seem to have database errors more than we don't.  We also have caught on
a number of occasions some lmtp processes getting stuck, spinning at 99.9%
CPU in the process table.  Throwing a debugger at them shows they are busy
waiting for a lock to become available.  We usually have to kill them off.
So, what kind of information should I provide you to help track down the
problem?  It sounds like there is either a bug, or something has to be
added to increase the robustness and recovery of the db3 locking mechanism.

Scott
--
 +-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-+
      Scott W. Adkins                http://www.cns.ohiou.edu/~sadkins/
   UNIX Systems Engineer                  mailto:[EMAIL PROTECTED]
        ICQ 7626282                 Work (740)593-9478 Fax (740)593-1944
 +-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+=-=-=-=-=-=-=-=-+
     CNS, HDL Center, Suite 301, Ohio University, Athens, OH 45701-2979

Reply via email to