Andrzej Zawadzki schrieb:
My bacula just finished full backup of last server:

7488  Full     58,075    1.525 G  OK       21-lut-08 07:36 NSwiatrak

now time is:

Thu Feb 21 08:57:08 CET 2008

and bacula is waiting...

Running Jobs:
 JobId Level   Name                       Status
======================================================================
  7489         Tape-Eject.2008-02-19_04.00.55 is waiting for higher priority 
jobs to finish
  7490 Full    BackupCatalog.2008-02-19_04.30.56 is waiting execution
  7500 Full    BackupCatalog.2008-02-20_04.30.06 has been canceled
  7501 Increme  NSspichlerz.2008-02-21_00.05.11 is waiting for a mount request

Ah yes, that's a cute one. I have run into something similar already.
It does seem a bit counterintuitive at first, to put it mildly. :-)

What has happened here is that by the time your 2008-02-19 job had
finished, the job scheduled for 2008-02-21 was already due. As your
jobs "Tape-Eject" and "BackupCatalog" have lower priority than your
actual backup job "NSspichlerz" (as recommended in the Bacula manual)
job 7501 "skipped the queue" and was started before the earlier jobs
7489 and 7490.

So job 7501 is now waiting for its tape, but in vain, because the
previous tape has not been ejected. But job 7489 which should have
done that is in turn waiting for job 7501 because that one had higher
priority, IOW "it should go first". Deadlock. And the catalog job 7490
is caught in between.

Btw, I think it is a mistake to put the BackupCatalog job after the
Tape-Eject job. How can it backup the catalog if the tape has already
been ejected?

How to resolve the deadlock? Two possibilities:

a) If you want to complete the 2008-02-19 backup correctly:
Cancel job 7489 and any jobs after 7501 that might already be in a
"waiting" state. Then cancel job 7501. Job 7490 will be the next to
run, and back up your catalog, so the backup of 2008-02-19 will be
complete. Then unmount and eject the tape manually, mount the next
tape, and let the cycle resume.

b) If you prefer to have the 2008-02-21 job run now:
Cancel first job 7490 (so that it can't interfere), then 7489, unmount
and eject the tape manually, cancel any jobs for dates after 2008-02-21
that may already be in "waiting" state, and then mount the next tape.
Job 7501 will resume, and afterwards everything will hopefully proceed
according to your schedule.

How to avoid that in the future? I have put that question to the list
once already. The only answer was to run jobs that might require
operator intervention (read: any backup job) only at times when an
operator is present (read: not on weekends or holidays). IOW: Bacula
doesn't like waiting. :-)

HTH
T.

--
Tilman Schmidt
Phoenix Software GmbH                             www.phoenixsoftware.de
53227 Bonn, Germany                            Amtsgericht Bonn HRB 2934


Attachment: signature.asc
Description: OpenPGP digital signature

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to