On 15.07.20 00:05, Phil Stracchino wrote:
1. The triggering condition is when a DB record insertion fails for any
reason, *including recoverable failures* such as InnoDB rollbacks.
(MySQL uses rollbacks to notify the application of any of several types
of transient error, including deadlocks or lock wait time exceeded
during a transaction. Galera *additionally* uses InnoDB rollbacks to
notify the application of a local commit conflict between nodes.)
2. The correct action in the case of receiving a rollback from a
MySQL-compatible DB, either standalone *or* a Galera cluster, is to make
at least one attempt to resubmit the transaction. Instead, Bacula's
MySQL driver is regarding all errors as fatal and immediately aborting
the entire job without retrying the insert.
3. When the job is aborted because of a DB insertion error, the job is
marked as having a fatal error, but is not properly terminated, and
hangs indefinitely, potentially blocking other jobs with lower
priorities or waiting for the resources the hung job is using.
4. The fatal error status is correctly reported in bconsole by 'status
dir', but the jobs list in BAT shows the job as still running because it
has not terminated.
5. The failed job CAN be cancelled, but to complete the termination of
the job takes a long time, and if a second job is cancelled before the
first cancellation has completed, there is an extremely high likelihood
that the Director will crash.
6. The bug manifests only in 9.6.5.
Looking at my logs, I have seen the same in my 9.6.5 installation on
Debian 10:
Excerpts from Error/Canceled-Mails:
-----
24-Jun 00:02 back-dir JobId 585301: Error: bdb.h:140 bdb.h:140 update
UPDATE Job SET JobStatus='R',Level='I',StartTime='2020-06-24
00:02:58',ClientId=73,JobTDate=1592949778,PoolId=7,FileSetId=427 WHERE
JobId=585301 failed:
Deadlock found when trying to get lock; try restarting transaction
24-Jun 00:02 back-dir JobId 585301: Fatal error: bdb.h:140 update UPDATE
Job SET JobStatus='R',Level='I',StartTime='2020-06-24
00:02:58',ClientId=73,JobTDate=1592949778,PoolId=7,FileSetId=427 WHERE
JobId=585301 failed:
Deadlock found when trying to get lock; try restarting transaction
-----
28-Jun 00:02 back-dir JobId 586375: Error: bdb.h:140 bdb.h:140 update
UPDATE Job SET JobStatus='R',Level='I',StartTime='2020-06-28
00:02:24',ClientId=149,JobTDate=1593295344,PoolId=7,FileSetId=153 WHERE
JobId=586375 failed:
Deadlock found when trying to get lock; try restarting transaction
28-Jun 00:02 back-dir JobId 586375: Fatal error: bdb.h:140 update UPDATE
Job SET JobStatus='R',Level='I',StartTime='2020-06-28
00:02:24',ClientId=149,JobTDate=1593295344,PoolId=7,FileSetId=153 WHERE
JobId=586375 failed:
Deadlock found when trying to get lock; try restarting transaction
-----
09-Jul 00:04 back-dir JobId 589511: Error: bdb.h:140 bdb.h:140 update
UPDATE Job SET JobStatus='R',Level='I',StartTime='2020-07-09
00:04:16',ClientId=255,JobTDate=1594245856,PoolId=7,FileSetId=408 WHERE
JobId=589511 failed:
Deadlock found when trying to get lock; try restarting transaction
09-Jul 00:04 back-dir JobId 589511: Fatal error: bdb.h:140 update UPDATE
Job SET JobStatus='R',Level='I',StartTime='2020-07-09
00:04:16',ClientId=255,JobTDate=1594245856,PoolId=7,FileSetId=408 WHERE
JobId=589511 failed:
Deadlock found when trying to get lock; try restarting transaction
-----
This is with a remote MariaDB 10.3 server, i.e. the database is not on
the same server but exclusive for Bacula.
Grüße,
Sven.
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel