Hello,
On 02.11.2005 19:26, Vadim A. Umanski wrote:
How do you do, bacula-users.
I've inherited a backup system powered by Bacula (Bacula 1.36.1), it
runs on Solaris 10 for x86 and stores data on a disk array. Previuos
sysadmin installed it, but now he is not accesible anymore. The server
running Bacula is used for some other important things also, so I have
to treat it and reconfigure it with caution.
That would be better...
It worked OK for some while, I'm new to Bacula so I didn't touch a
working software. It starts doing backups at early night and finishes
in the morning. Full backups every sunday and incrementals dayly. But
after some time one of bacula processes started to crash every morning
and 1 or more (or all) jobs were left not done. Such situation last
for some weeks - it become clear to me that I need help.
Ok, let's see what we can do.
Here I'll try to describe what's happening.
--------------------------------------------------------------
Normally on the server it looked like this
# ps -ef | grep bacula|grep -v grep
bacula 1362 1 0 10:23:03 ? 0:00
/usr/local/bacula/sbin/bacula-dir -u bacula -g bacula -v -c /usr/local/bacula/e
root 1350 1 0 10:22:40 ? 0:00
/usr/local/bacula/sbin/bacula-fd -u root -g root -v -c /usr/local/bacula/etc/ba
bacula 1348 1 0 10:22:40 ? 0:00
/usr/local/bacula/sbin/bacula-sd -u bacula -g bacula -v -c /usr/local/bacula/et
That's normal.
and every morning Director's process bacula-dir is missing.
That's bad.
Last morning log's end looks like this
# less /var/db/bacula/log
... skip some log output...
02-Nov 03:15 nfs4p-dir: Start Backup JobId 3960,
Job=sinux-oracle.2005-11-02_03.15.00
02-Nov 03:15 sinux-fd: ClientRunBeforeJob: -su: line 8: ulimit: max user
processes: cannot modify limit: Operation not permitt
ed
That one indicates a problem, I guess. There seems to be a limit on the
number of processes a user can have running. Some script or program
tries to increase that limit.
You should investigate the script that is called as Client Run Before
Job script for the job sinux-oracle. Just to point out the obvious: That
script is not on the director machine (probably nfs-4p) but on sinux.
That situation *could* indicate a serious security problem, even a
compromised database server. Good luck.
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: SQL*Plus: Release 10.1.0.3.0 -
Production on Wed Nov 2 03:19:27 2005
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: Copyright (c) 1982, 2004, Oracle.
All rights reserved.
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: Connected to:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: Oracle Database 10g Enterprise
Edition Release 10.1.0.3.0 - Production
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: With the Partitioning, OLAP and Data
Mining options
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: TO_CHAR(SYSDATE,'YY
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: -------------------
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: 2005-11-02 03:19:28
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: Disconnected from Oracle Database
10g Enterprise Edition Release 10.1.0.3.0 - Produ
ction
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: With the Partitioning, OLAP and Data
Mining options
02-Nov 03:19 s10-sd: Volume "Vol0086" previously written, moving to end of data.
02-Nov 03:21 sinux-fd: ClientRunAfterJob: -su: line 8: ulimit: max user
processes: cannot modify limit: Operation not permitte
d
See above.
...
more output
"That's all, folks!" (c) :-(
I run
# /etc/bacula/bconsole
and see
Connecting to Director 127.0.0.1:9101
1000 OK: nfs4p-dir Version: 1.36.1 (26 November 2004)
Enter a period to cancel a command.
*status 1
Using default Catalog name=MyCatalog DB=bacula
Automatically selected Storage: File
Connecting to Storage daemon File at 10.253.4.15:9103
s10-sd Version: 1.36.1 (26 November 2004) i386-pc-solaris2.10 solaris 5.10
Daemon started 02-Nov-05 20:10, 0 Jobs run since started.
Running Jobs:
No Jobs running.
====
Terminated Jobs:
JobId Level Files Bytes Status Finished Name
======================================================================
3952 Incr 2,462 1,889,217 OK 02-Nov-05 01:21 ns02
3953 Incr 1 33,512,324 OK 02-Nov-05 01:22 sinux
3954 Incr 83 28,008,073 OK 02-Nov-05 01:23 dbh1-matroska
3955 Incr 0 0 OK 02-Nov-05 01:23 dbh1-configs
3956 Incr 0 0 OK 02-Nov-05 01:23 dbh1-home
3957 Incr 1,418 84,707,857 OK 02-Nov-05 01:28 hpov-full
3958 Incr 67 615,990,101 OK 02-Nov-05 01:37 dbh2-full
3959 Full 1 186,384,929 OK 02-Nov-05 01:51 BackupCatalog
3960 Full 5 181,932,043 OK 02-Nov-05 03:21 sinux-oracle
3961 Incr 9,889 2,246,582,808 Cancel 02-Nov-05 10:22 cgatex-full
====
Device status:
Device "/d/0/bacula" is not open.
====
The last job is the most important - it's the mail server... :-(
It looks like that job hasn't failed but got cancelled - hat status
should, as far as I know, only happen as a direct result of user
intervention.
If I leave this console till next morning and try to enter any command
after the bacula-dir crashes it'll die also being unable to connect to
Director.
Ok, the DIR dies during the night.
You can do the following:
Either run the director with debug output enabled and capture the
output. You'd call it with something like "./bacula-dir -v -d 200 -c
/etc/bacula/bacula-dir >>/var/log/bacula-dir.output". Adjust paths and
debug level to your needs... a debug levelof 100 gives a good overview
of the program flow, 400 results in lots and lots of details, and 900
gives you more than you will need to locate the problem, I guess.
After the DIR crashes, you should investigate the last lines of the
output, probably post it here. Perhaps it helps to locate the problem.
The other possibility is to run the DIR under the debugger - there are
some instructions in the manual. It would be best if you know a little
about how to work with gdb, though.
Finally, and I suspect that this would be something you'd end up with
anyway, you could upgrade to the current release version 1.38. This
version does fix some bugs, introduces some features, and requires only
minor - if at all - configuration changes. It does require a catalog
upgrade, sou you will want to read the instructions carefully :-)
I suspect that, if you found a bug in bacula, you will be forcd to
upgrade because it's unlikely that Kern will fix an older version.
I tried to search using quotes from logs and messages I was getting,
but I haven't found somthing that would match my problem. My
colleagues couldn't help me - they haven't seen all this before.
I can surely restart Bacula (with the startup script
/etc/rc3.d/S50bacula with target restart , for example) but the
promlem persists - I see exactly what I just wrote here.
--------------------------------------------------------------
Smart guys that know what to do - please help!
Maybe I should quote some extra logs or some configs or something
else... I really want to ask a good question so a good answer could
be given.
Well, start with the debug log and probably the debugger. That should
help understanding what happens when Bacula crashes. Or upgrade to 1.38
and see if that fixes your problem (which might easily happen). The
upgrade itself is not a problem as long as you know how your installed
version was built (options to configure) and have the necessary
toolchain and libraries installed. The catalog upgrade can be a problem
as you can not easily revert to an older version...
Thanks for your attention. I really need help to make my problem clear
and solve it. Any good advice will move things from bad to good.
Good luck to everybody!
Well, and good luck for fixing your problems. Keep us informed, or post
some more detailed information and I'm confident that can be fixed.
Arno
--
IT-Service Lehmann [EMAIL PROTECTED]
Arno Lehmann http://www.its-lehmann.de
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users