How do you do.

>> Full backups every sunday and incrementals dayly. But
>> after some time one of bacula processes started to crash every morning
>> and 1 or more (or all) jobs were left not done. Such situation last
>> for some weeks - it become clear to me that I need help.

>> and every morning Director's process bacula-dir is missing.

AL> That's bad.

Yes, the most obvious sign of a problem...

>> Last morning log's end looks like this
>> # less /var/db/bacula/log
AL> ... skip some log output...
>> 02-Nov 03:15 nfs4p-dir: Start Backup JobId 3960, 
>> Job=sinux-oracle.2005-11-02_03.15.00
>> 02-Nov 03:15 sinux-fd: ClientRunBeforeJob: -su: line 8: ulimit: max user 
>> processes: cannot modify limit: Operation not permitt
>> ed

AL> That one indicates a problem, I guess. There seems to be a limit on the
AL> number of processes a user can have running. Some script or program 
AL> tries to increase that limit.
AL> You should investigate the script that is called as Client Run Before 
AL> Job script for the job sinux-oracle. Just to point out the obvious: That
AL> script is not on the director machine (probably nfs-4p) but on sinux.

That's an old (and by now not very important) Oracle server that's
used in development. It does something before and after backup. It
worked OK before and even now the backup job for sinux finishes with
"Termination:            Backup OK"

The guys that use that machine will look at these strange messages
anyway, thank you, Arno!

AL> That situation *could* indicate a serious security problem, even a 
AL> compromised database server. Good luck.

AL> ...
AL> more output
>> 
>> "That's all, folks!" (c) :-(

That's how it died on cgatex job last time.

02-Nov 03:21 nfs4p-dir: Begin pruning Jobs.
02-Nov 03:21 nfs4p-dir: No Jobs found to prune.
02-Nov 03:21 nfs4p-dir: Begin pruning Files.
02-Nov 03:21 nfs4p-dir: No Files found to prune.
02-Nov 03:21 nfs4p-dir: End auto prune.

02-Nov 07:05 nfs4p-dir: Start Backup JobId 3961, 
Job=cgatex-full.2005-11-02_07.05.00
02-Nov 07:05 cgatex-fd-fd: Since time adjusted by 0 seconds.
02-Nov 07:05 s10-sd: Volume "Vol0086" previously written, moving to end of data.
02-Nov 07:06 s10-sd: User defined maximum volume capacity 734,003,200 exceeded 
on device /d/0/bacula.
02-Nov 07:06 s10-sd: End of medium on Volume "Vol0086" Bytes=733,941,548 
Blocks=11,378 at 02-Nov-2005 07:06.
02-Nov 07:06 nfs4p-dir: Recycled volume "Vol0087"
...
...
02-Nov 07:35 s10-sd: Recycled volume "Vol0091" on device "/d/0/bacula", all 
previous data lost.
02-Nov 07:35 s10-sd: New volume "Vol0091" mounted on device /d/0/bacula at 
02-Nov-2005 07:35.
02-Nov 08:04 s10-sd: User defined maximum volume capacity 734,003,200 exceeded 
on device /d/0/bacula.
02-Nov 08:04 s10-sd: End of medium on Volume "Vol0091" Bytes=733,952,897 
Blocks=11,377 at 02-Nov-2005 08:04.
03-Nov 01:05 nfs4p-dir: Start Backup JobId 3962, Job=nfs4p.2005-11-03_01.05.00
03-Nov 01:05 nfs4p-fd: Since time adjusted by -1095 seconds.

I don't understand the last string...

>> 
>> I run
>> 
>> # /etc/bacula/bconsole
>> 
>> and see
>> 
>> Connecting to Director 127.0.0.1:9101
>> 1000 OK: nfs4p-dir Version: 1.36.1 (26 November 2004)
>> Enter a period to cancel a command.
>> *status 1
>> Using default Catalog name=MyCatalog DB=bacula
>> Automatically selected Storage: File
>> Connecting to Storage daemon File at 10.253.4.15:9103
>> 
>> s10-sd Version: 1.36.1 (26 November 2004) i386-pc-solaris2.10 solaris 5.10
>> Daemon started 02-Nov-05 20:10, 0 Jobs run since started.
>> 
>> Running Jobs:
>> No Jobs running.
>> ====
>> 
>> Terminated Jobs:
>>  JobId  Level   Files          Bytes Status   Finished        Name
>> ======================================================================
>>   3952  Incr      2,462      1,889,217 OK       02-Nov-05 01:21 ns02
>>   3953  Incr          1     33,512,324 OK       02-Nov-05 01:22 sinux
>>   3954  Incr         83     28,008,073 OK       02-Nov-05 01:23 dbh1-matroska
>>   3955  Incr          0              0 OK       02-Nov-05 01:23 dbh1-configs
>>   3956  Incr          0              0 OK       02-Nov-05 01:23 dbh1-home
>>   3957  Incr      1,418     84,707,857 OK       02-Nov-05 01:28 hpov-full
>>   3958  Incr         67    615,990,101 OK       02-Nov-05 01:37 dbh2-full
>>   3959  Full          1    186,384,929 OK       02-Nov-05 01:51 BackupCatalog
>>   3960  Full          5    181,932,043 OK       02-Nov-05 03:21 sinux-oracle
>>   3961  Incr      9,889  2,246,582,808 Cancel   02-Nov-05 10:22 cgatex-full
>> ====
>> 
>> Device status:
>> Device "/d/0/bacula" is not open.
>> ====
>> 
>> The last job is the most important - it's the mail server... :-(

AL> It looks like that job hasn't failed but got cancelled - hat status 
AL> should, as far as I know, only happen as a direct result of user 
AL> intervention.

No one but me could intervent. I did not. There was no manual
cancel... it would be too simple...

I wonder ... when DIR falls, what's going on then...

>> If I leave this console till next morning and try to enter any command
>> after the bacula-dir crashes it'll die also being unable to connect to
>> Director.

AL> Ok, the DIR dies during the night.

AL> You can do the following:
AL> Either run the director with debug output enabled and capture the 
AL> output. You'd call it with something like "./bacula-dir -v -d 200 -c 
AL> /etc/bacula/bacula-dir >>/var/log/bacula-dir.output". Adjust paths and
AL> debug level to your needs... a debug levelof 100 gives a good overview
AL> of the program flow, 400 results in lots and lots of details, and 900 
AL> gives you more than you will need to locate the problem, I guess.
AL> After the DIR crashes, you should investigate the last lines of the 
AL> output, probably post it here. Perhaps it helps to locate the problem.

Thank you very much for the advice! I've just adjusted the startup
script for Bacula. I think I'll start with -d 200 as you recommend and
see what happens ...

AL> The other possibility is to run the DIR under the debugger - there are
AL> some instructions in the manual. It would be best if you know a little
AL> about how to work with gdb, though.

Not too much, unfortunately.

AL> Finally, and I suspect that this would be something you'd end up with 
AL> anyway, you could upgrade to the current release version 1.38. This 
AL> version does fix some bugs, introduces some features, and requires only
AL> minor - if at all - configuration changes. It does require a catalog 
AL> upgrade, so you will want to read the instructions carefully :-)

OK, I'll keep this in mind. But first of all I must try to reanimate
what I have got...

AL> I suspect that, if you found a bug in bacula, you will be forced to
AL> upgrade because it's unlikely that Kern will fix an older version.

I see...

AL> Well, start with the debug log and probably the debugger. That should
AL> help understanding what happens when Bacula crashes. Or upgrade to 1.38
AL> and see if that fixes your problem (which might easily happen). The 
AL> upgrade itself is not a problem as long as you know how your installed
AL> version was built (options to configure)

I'll probably have to investigate it.

AL>  and have the necessary toolchain and libraries installed. The
AL> catalog upgrade can be a problem  as you can not easily revert to
AL> an older version...

That matters.

>> Thanks for your attention. I really need help to make my problem clear
>> and solve it. Any good advice will move things from bad to good.

AL> Well, and good luck for fixing your problems. Keep us informed, or post
AL> some more detailed information and I'm confident that can be fixed.

Surely. I'll quote some information right when I'll have it.

Thank you very much!

-- 
   SY                       Vadim A. Umanski



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to