On 1/15/21 8:21 AM, Radosław Korzeniewski wrote:
> Hello Phil,
> 
> czw., 14 sty 2021 o 19:19 Phil Stracchino <ph...@caerllewys.net
> <mailto:ph...@caerllewys.net>> napisał(a):
> 
>     I am increasingly convinced one of the changes in 9.6.7 introduced a but
>     of some kind here, PROBABLY involving a race condition between
>     concurrent jobs.
> 
> 
> Could you share your thoughts on this? Please.


Well, I don't unfortunately have anything solid in the code, I've been
too busy to try to look into it.  But this class of failures is new in
9.6.7 vs 9.6.3.  (9.6.4-9.6.6 had other DB problems, at least against
MariaDB, that forced me to keep my Director at 9.6.3.)  But every few
days, two to three jobs are failing all with duplicate entries inserting
into either Filename or Path.  (The first group was on Filename inserts,
the next on Path.)  What is more, even though the jobs logged fatal
errors, they did not terminate.  They just stayed active in the
Director, open but failed and not running, and blocking later jobs.

This is the console errors from the first set of failures:

11-Jan 04:33 minbar-dir JobId 28397: Fatal error: sql_create.c:1015
Create db Filename record INSERT INTO Filename (Name) VALUES
('gjs-1.66.2.ebuild') failed. ERR=Duplicate entry 'gjs-1.66.2.ebuild'
for key 'Name'
11-Jan 04:33 minbar-dir JobId 28397: Fatal error: catreq.c:664 attribute
create error. ERR=sql_create.c:1015 Create db Filename record INSERT
INTO Filename (Name) VALUES ('gjs-1.66.2.ebuild') failed. ERR=Duplicate
entry 'gjs-1.66.2.ebuild' for key 'Name'
11-Jan 04:34 minbar-dir JobId 28400: Fatal error: sql_create.c:1015
Create db Filename record INSERT INTO Filename (Name) VALUES
('aegisub-3.2.2_p20160518-avoid-conveying-positional-parameters-to-source-builtin.patch')
failed. ERR=Duplicate entry
'aegisub-3.2.2_p20160518-avoid-conveying-positional-parameters...' for
key 'Name'
11-Jan 04:34 minbar-dir JobId 28400: Fatal error: catreq.c:664 attribute
create error. ERR=sql_create.c:1015 Create db Filename record INSERT
INTO Filename (Name) VALUES
('aegisub-3.2.2_p20160518-avoid-conveying-positional-parameters-to-source-builtin.patch')
failed. ERR=Duplicate entry
'aegisub-3.2.2_p20160518-avoid-conveying-positional-parameters...' for
key 'Name'

The second set, yesterday, was similar except that the inserts were into
Path.


My job has kept me too busy to investigate this in depth, but my
*recollection* of the last time I looked at this code is that Bacula's
MySql driver checks to see whether the row exists, then inserts it if it
doesn't.  And it seems to me the most likely explanation for this
behavior is that we are seeing a race condition intermittently occurring
between jobs backing up machines that received the same set of overnight
updates, the second job having checked that the row does not exist
before the first can insert it.  (My memory that there is a pre-check
for existence could be mistaken.)


At least for Path and Filename, the only two tables I have so far seen
this error occur in, there are only two columns, the id and the Path or
Filename itself, and there are single-column unique indexes over Path
and Filename.  There are no secondary columns to affect.  Given that, my
first cut at fixing this would be to replace INSERTs into these two
tables with an INSERT IGNORE or a REPLACE INTO.  Both of those will have
the same effect:  A new row will be created if the value does not
already exist in the table, and if it does, then the operation becomes a
NOOP.


I will TRY to find time to look at the code and see whether I can patch
it myself and test.  I might get time today or this weekend.


-- 
  Phil Stracchino
  Babylon Communications
  ph...@caerllewys.net
  p...@co.ordinate.org
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958


_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to