Hi again, I forced the errors we've been chasing to re-occur with my patched-up bacula-3.0.3 install, by reducing PostgreSQL's maximum connections to 4 and running 12 backup jobs simultaneously-ish (started with a bash for loop piped into bconsole). I can confirm that the PostgreSQL error is being logged correctly now, but I'm not 100% sure it's being handled correctly.
Of the 12 jobs started, 6 completed successfully, three correctly cancelled themselves due to being unable to establish a connection to PostgreSQL, and three are currently still classed by the director as "Running" though they are in the same "Fatal Error" state as usual. One of these three cannot be cancelled as the director says: 2901 Job rmarst-desktop.2009-11-04_12.10.10_06 not found. 3904 Job rmarst-desktop.2009-11-04_12.10.10_06 not found. The other two cause the same bconsole "hang" as seen before when I attempt to cancel them. After restarting the SD, one of the three jobs sucessfully transitioned away from the "Running" state, the other two cannot be cancelled in the same manner as above. After restarting the director, these jobs vanished without a trace from the console, but their errors were logged into backup.log. Here is an example of the error log of one of the jobs that cancelled successfully: 04-Nov 12:10 bksrv0-dir JobId 327: Start Backup JobId 327, Job=graham-desktop.2009-11-04_12.10.12_20 04-Nov 12:10 bksrv0-dir JobId 327: Using Device "graham-desktop" 04-Nov 12:10 bksrv0-sd JobId 327: Volume "graham-desktop-0099" previously written, moving to end of data. 04-Nov 12:10 bksrv0-sd JobId 327: Ready to append to end of Volume "graham-desktop-0099" size=7033783439 04-Nov 12:11 bksrv0-dir JobId 327: Fatal error: sql.c:748 sql.c:747 Could not open database "bacula": ERR=postgresql.c:234 Unable to connect to PostgreSQL server. Database=bacula User=bacula It is probably not running or your password is incorrect. 04-Nov 12:11 bksrv0-dir JobId 327: Fatal error: catreq.c:488 Attribute create error. Query failed: DROP TABLE DelCandidates: ERR=ERROR: table "delcandidates" does not exist 04-Nov 12:11 bksrv0-sd JobId 327: Job graham-desktop.2009-11-04_12.10.12_20 marked to be canceled. 04-Nov 12:11 bksrv0-sd JobId 327: Fatal error: fd_cmds.c:177 FD command not found: 112 1 0 04-Nov 12:11 bksrv0-sd JobId 327: Job write elapsed time = 00:01:35, Transfer rate = 348.2 K bytes/second 04-Nov 12:11 bksrv0-sd JobId 327: Fatal error: append.c:292 Fatal append error on device "graham-desktop" (/backup/volumes/graham-desktop/): ERR= 04-Nov 12:11 bksrv0-sd JobId 327: Fatal error: fd_cmds.c:166 Command error with FD, hanging up. Append data error. 04-Nov 12:11 graham-desktop JobId 327: Fatal error: backup.c:964 Network send error to SD. ERR=Connection reset by peer Each of the three other jobs had different error messages caused by the restart of the storage daemon: 04-Nov 13:27 richard-desktop JobId 325: Fatal error: backup.c:1108 Network send error to SD. ERR=Input/output error 04-Nov 13:47 bksrv0-dir JobId 325: Fatal error: bsock.c:488 Packet size too big from "Storage daemon:bksrv0:9103. Terminating connection. 04-Nov 13:23 norman-desktop JobId 322: Fatal error: backup.c:964 Network send error to SD. ERR=Input/output error 04-Nov 13:47 bksrv0-sd JobId 322: Fatal error: append.c:243 Network error on data channel. ERR=Connection reset by peer 04-Nov 13:47 bksrv0-sd JobId 322: Job write elapsed time = 01:36:49, Transfer rate = 37 bytes/second 04-Nov 13:47 bksrv0-sd JobId 322: Fatal error: append.c:292 Fatal append error on device "norman-desktop" (/backup/volumes/norman-desktop/): ERR= 04-Nov 13:47 bksrv0-dir JobId 322: Error: bsock.c:518 Read error from Storage daemon:bksrv0:9103: ERR=No data available 04-Nov 12:10 bksrv0-sd JobId 320: Job rmarst-desktop.2009-11-04_12.10.10_06 marked to be canceled. 04-Nov 12:18 bksrv0-sd JobId 320: Fatal error: fd_cmds.c:177 FD command not found: 12 7 0 04-Nov 12:18 bksrv0-sd JobId 320: Job write elapsed time = 00:07:52, Transfer rate = 3 bytes/second 04-Nov 12:18 bksrv0-sd JobId 320: Fatal error: append.c:292 Fatal append error on device "rmarst-desktop" (/backup/volumes/rmarst-desktop/): ERR= 04-Nov 12:18 bksrv0-sd JobId 320: Fatal error: fd_cmds.c:166 Command error with FD, hanging up. Append data error. 04-Nov 12:18 rmarst-desktop JobId 320: Fatal error: backup.c:1068 Network send error to SD. ERR=Connection reset by peer I'm not sure if all this is useful information. If there's anything else you'd like me to try to help narrow down what's going on, just let me know! --Alex ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel