Hello, I just subscribed to the list, though I've been happily using Bacula for about a year now. Last month I ran into my first serious problem, and I'm not sure how to troubleshoot it.

I'd been using version 1.36.2 on an SMP machine running FreeBSD 5.4 (i386 platform) backing up several different machines on a network to a hardware RAID array. It worked great.

Then I decided to put the backup responsibilities on a different machine. This one is a single-processor AMD machine that also runs the i386 port of FreeBSD 5.4. It's headless, and it also runs a webserver, mailserver, firewall, and several other "always on" services, so I thought it would be good to give it the backup responsibilities too. I got a SATA controller and a couple of 200GB drives to hold the backups, set up gmirroring on them, and installed Bacula 1.36.3 from the ports collection, since that was the new current version. I chose the default SQLite database, since I was familiar with it and had always been happy with it before. I copied my old configuration over to the new machine, changing names and addresses where appropriate. I built Bacula from source using the ports collection with the default options. I'm not using any graphical components at all.

Then the problems started.

Often (nearly always) whenever I'd attempt a full backup, the director daemon would (a) silently terminate (b) cause the system to hang or (c) reboot the system. There was never anything in the Bacula log, syslog, or the console message log. It doesn't matter if the job starts automatically or manually from bconsole. Liklihood of a problem seems directly proportional to the size of the fileset.

My first suspicion was hardware, since the new and old machines were running the same OS and almost the same version of Bacula with almost the same configuration.

First I replaced my hub with a switch, since I was getting tons of packet collisions. This improved my traffic situation, but then I realized that the director would sometimes die even on a totally local backup, so that rules out network problems.

Next I suspected a problem with the new controller, drives, or gmirror configuration. I stress-tested these drives as much as I could, copying huge amounts of data in several different threads all at the same time, pushing the drives to the limits according to gstat, but never had any problems. I'm not ruling out bad hardware or gmirror problems, but if that is the problem, I don't know how to prove it. Simply loading down the drives with prolonged heavy write activity doesn't seem to cause a problem.

Then I decided to upgrade all the file daemons on my network from 1.36.2 to 1.36.3, just in case there was some compatibility problem between the two versions. No change.

Through sheer persistence and luck, I managed to get Bacula to make full backups of all the machines on my network. I left Bacula running, and it ran fine for most of a month doing small incremental backups... but when it came time for some new full backups, the system hung again.

Next I started over from scratch and tried a different database. I already had SQLite3 on this machine and I thought perhaps there was some conflict with the SQLite2 that Bacula used. I switched from SQLite to Postgresql 8.0. No change. The director still usually terminates, hangs the system, or reboots the system soon after I begin a full backup of anything.

I haven't tried getting a traceback. I thought I'd try to get more information on how to proceed before I crash my server anymore. I've gone several pages into the bugs database and don't see anything relevant that hasn't already been fixed. I've got to believe that this is a hardware/OS problem that I don't know how to isolate, some bizarre configuration problem that this machine has that the other machine did not, or a difference between 1.36.2 and 1.36.3.

Thank you for this great software and any hints on how I could proceed.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to