On 1/15/21 8:21 AM, Radosław Korzeniewski wrote: > Hello Phil, > > czw., 14 sty 2021 o 19:19 Phil Stracchino <ph...@caerllewys.net > <mailto:ph...@caerllewys.net>> napisał(a): > > I am increasingly convinced one of the changes in 9.6.7 introduced a but > of some kind here, PROBABLY involving a race condition between > concurrent jobs. > > > Could you share your thoughts on this? Please.
Well, I don't unfortunately have anything solid in the code, I've been too busy to try to look into it. But this class of failures is new in 9.6.7 vs 9.6.3. (9.6.4-9.6.6 had other DB problems, at least against MariaDB, that forced me to keep my Director at 9.6.3.) But every few days, two to three jobs are failing all with duplicate entries inserting into either Filename or Path. (The first group was on Filename inserts, the next on Path.) What is more, even though the jobs logged fatal errors, they did not terminate. They just stayed active in the Director, open but failed and not running, and blocking later jobs. This is the console errors from the first set of failures: 11-Jan 04:33 minbar-dir JobId 28397: Fatal error: sql_create.c:1015 Create db Filename record INSERT INTO Filename (Name) VALUES ('gjs-1.66.2.ebuild') failed. ERR=Duplicate entry 'gjs-1.66.2.ebuild' for key 'Name' 11-Jan 04:33 minbar-dir JobId 28397: Fatal error: catreq.c:664 attribute create error. ERR=sql_create.c:1015 Create db Filename record INSERT INTO Filename (Name) VALUES ('gjs-1.66.2.ebuild') failed. ERR=Duplicate entry 'gjs-1.66.2.ebuild' for key 'Name' 11-Jan 04:34 minbar-dir JobId 28400: Fatal error: sql_create.c:1015 Create db Filename record INSERT INTO Filename (Name) VALUES ('aegisub-3.2.2_p20160518-avoid-conveying-positional-parameters-to-source-builtin.patch') failed. ERR=Duplicate entry 'aegisub-3.2.2_p20160518-avoid-conveying-positional-parameters...' for key 'Name' 11-Jan 04:34 minbar-dir JobId 28400: Fatal error: catreq.c:664 attribute create error. ERR=sql_create.c:1015 Create db Filename record INSERT INTO Filename (Name) VALUES ('aegisub-3.2.2_p20160518-avoid-conveying-positional-parameters-to-source-builtin.patch') failed. ERR=Duplicate entry 'aegisub-3.2.2_p20160518-avoid-conveying-positional-parameters...' for key 'Name' The second set, yesterday, was similar except that the inserts were into Path. My job has kept me too busy to investigate this in depth, but my *recollection* of the last time I looked at this code is that Bacula's MySql driver checks to see whether the row exists, then inserts it if it doesn't. And it seems to me the most likely explanation for this behavior is that we are seeing a race condition intermittently occurring between jobs backing up machines that received the same set of overnight updates, the second job having checked that the row does not exist before the first can insert it. (My memory that there is a pre-check for existence could be mistaken.) At least for Path and Filename, the only two tables I have so far seen this error occur in, there are only two columns, the id and the Path or Filename itself, and there are single-column unique indexes over Path and Filename. There are no secondary columns to affect. Given that, my first cut at fixing this would be to replace INSERTs into these two tables with an INSERT IGNORE or a REPLACE INTO. Both of those will have the same effect: A new row will be created if the value does not already exist in the table, and if it does, then the operation becomes a NOOP. I will TRY to find time to look at the code and see whether I can patch it myself and test. I might get time today or this weekend. -- Phil Stracchino Babylon Communications ph...@caerllewys.net p...@co.ordinate.org Landline: +1.603.293.8485 Mobile: +1.603.998.6958 _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel