Since we still have an open bug, please add this to the bug report. Kern
On Saturday 20 March 2010 00:06:01 Hugh Brown wrote: > Kern Sibbald wrote: > > OK, I think the solution is for Hugh to: > > 1. Figure out why his alert command is broken > > 2. Create a script with a timer > > 3. Disable the alert > > Here's what I've done: > > -- Ran backups, no change; got a hang. Restarted sd and director. > > -- Commented out the "Alert" sections in bacula-sd for the two tape > drives. Ran backups, no hang. > > -- Changed the Alert section to: > > Alert Command = "sh -c '/etc/bacula/alert_debugging.pl %c'" > > which was a very simple perl script (attached). I reran backups and > got a hang. Here's what was logged: > > Mar 19 15:30:24 agnatha hugh[29410]: Parent here...waiting patiently > Mar 19 15:30:24 agnatha hugh[29409]: About to run /usr/sbin/smartctl -H > -l > error -q errorsonly -d scsi /dev/changer Mar 19 15:30:24 agnatha > hugh[29414]: Done: exit status 0 > > This was the same entries as seen before, since a bunch of jobs ran > and finished before the hang, but now I've got the two processes > again: > > bacula 28827 1.1 0.0 248588 6220 ? Ssl 15:24 0:18 > /usr/sbin/bacula-sd -u bacula -g disk -c /etc/bacula/bacula-sd.conf bacula > 29422 0.0 0.0 258816 4492 ? S 15:30 0:00 > /usr/sbin/bacula-sd -u bacula -g disk -c /etc/bacula/bacula-sd.conf > > I ran btraceback on both (attached). I can't find any mention of > PID 29422 (the child) in the traceback for 28827 (the parent). The > child appears to be hung at closelog(), which matches what I had in > the original traceback I sent with my first message to the list. I > ran kill -6 on both, but only the parent produced a lock dump > (attached). If I should add these files to the bug, let me know. > > The bacula I'm running now is compiled from the sourceforge SRPM, but > with the changes detailed in bug #1527: > > -- the patch removing the "debug_list_volumes" line > -- enabling the lock manager in the args to configure > -- and enabling developer mode in version.h > > I thought maybe that last one was causing problems, since it results > in stdout not being closed, but I was having this problem with the > stock (though still locally-compiled) SRPM. > > I'm thoroughly confused. For the weekend I'll just be removing the > alert section and letting things run. > > -- > Hugh Brown, Systems Manager > The Centre for High-Throughput Biology > [email protected] ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
