>>>>> On Mon, 07 Dec 2009 14:30:41 +0800, Jim Barber said:
> 
> Hi all.
> 
> I have a problem where every weekend (or more frequently) my storage daemon 
> crashes.
> The crash is random, but is happening either while running VirtualFull jobs 
> or Copy jobs.
> So far it hasn't crashed during regular incremental backups.
> 
> I am running version 3.0.3 of the Bacula software.
> 
> First of all I tried adding a '-d 200' to the arguments that start bacula-sd.
> This produced a lot of messages, but nothing unusual that I can see prior to 
> the crash.
> The last few lines in this log look like so:
> 
>       vc-sd: mac.c:241-468 before write JobId=468 FI=363302 SessId=1 Strm=MD5 
> len=16
>       vc-sd: mac.c:241-468 before write JobId=468 FI=363303 SessId=1 
> Strm=UATTR len=104
>       vc-sd: mac.c:241-468 before write JobId=468 FI=363304 SessId=1 
> Strm=UATTR len=122
>       vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 
> Strm=UATTR len=77
>       vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 
> Strm=DATA len=4496
>       vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 Strm=MD5 
> len=16
> 
> So next I have been trying to get the btraceback program running.
> 
> I am using Debian packages (self built based on the 3.0.2 Debian sources).
> These run the storage daemon under the bacula:tape user:group.
> So I modified the btraceback program to use sudo to run gdb.
> I also configured sudo to allow the bacula user to do so without being 
> prompted for a password.
> I then modified the Debian sources so that packages with debugging symbols 
> are produced.
> 
> If I become the bacula user and run a test like so:
> 
>       /usr/sbin/btraceback /usr/sbin/bacula-sd $PID
> 
> Where: $PID = the process ID of the bacula-sd process,
> then I get an email showing debugging information.
> So as far as I can tell the btraceback program should be working.
> 
> I had another crash of the storage daemon after making the changes and no 
> email was sent.
> Nor was a bacula-sd.9103.traceback file produced.
> So I can't send any useful information to try and track down why the storage 
> daemon is so unstable.
> 
> It was also unstable when using the 3.0.2 Debian package as well so I don't 
> think it is my rebuild that is causing the issue.
> Although I feel 3.0.3 is more stable than 3.0.2 was, I still can't get a 
> complete weeks cycle working without a crash.
> 
> The /etc/init.d/bacula-sd script defines the PATH to be, 
> PATH=/sbin:/bin:/usr/sbin:/usr/bin
> So /usr/sbin is in the PATH and so I'd imagine the program should be able to 
> find the traceback program.
> 
> Any ideas how I can get some useful information from the crash?

Try doing it interactively by attaching gdb to the bacula-sd process before it
crashes (run gdb /path/to/bacula-sd and then use gdb's attach command).  Then
use the commands in btraceback.gdb when it crashes.

__Martin

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to