>>>>> On Sun, 21 Jun 2020 09:08:49 -0400, Phil Stracchino said: > > On 2020-06-20 14:33, Phil Stracchino wrote: > > OK, two days with zero hung jobs. I am proceeding with re-upgrading > > ONLY the Director (well, and that host's FD) to 9.6.5. > > > That got me three successful jobs, one failed, and two hung. Here's the > failure: > > > 21-Jun 04:30 minbar-dir JobId 25026: Fatal error: sql_create.c:968 > Create db File > +record INSERT INTO File > (FileIndex,JobId,PathId,FilenameId,LStat,MD5,DeltaSeq) > +VALUES (2,25026,122083,109,'R0AAQAM E EHt H A A -B H IA F Be7wj5 BciQ9i > BciQ9i A A > +C','0',0) failed. ERR=Deadlock found when trying to get lock; try > restarting > +transaction21-Jun 04:30 minbar-dir JobId 25026: Fatal error: > catreq.c:513 Attribute > +create error: ERR=sql_create.c:968 Create db File record INSERT INTO File > +(FileIndex,JobId,PathId,FilenameId,LStat,MD5,DeltaSeq) VALUES > +(2,25026,122083,109,'R0AAQAM E EHt H A A -B H IA F Be7wj5 BciQ9i BciQ9i A A > +C','0',0) failed. ERR=Deadlock found when trying to get lock; try > restarting > +transaction21-Jun 04:30 asgard-fd JobId 25026: Error: bsock.c:383 Write > error > +sending 79 bytes to Storage daemon:asgard.caerllewys.net:9103: > ERR=Broken pipe
BTW, did you cancel the job in this case or did it crash by itself? > Once again it is somehow creating a local commit conflict on the > cluster. I CAN configure Bacula to send all transactions to a single > node of the cluster instead of load-balancing them; however, Bacula > SHOULD be detecting reported deadlocks and retrying on its own. I don't think Bacula has any code to that retries after deadlocks, mainly because it doesn't expect them (nothing else should be writing to the database and it takes care not to create deadlocks between threads). __Martin _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel