Kern Sibbald wrote: > On Sunday 06 December 2009 20:42:19 Jesper Krogh wrote: >> Kern Sibbald wrote: >>>> cwd is: / >>>> $ cd /mnt/backup >>>> cwd is: /mnt/backup/ >>>> $ mark cache >>>> 2,872,501 files marked. >>>> $ done >>>> Bootstrap records written to /var/lib/bacula/bacula-dir.restore.1.bsr >>> At that point, as far as I know, there is no more significant work for >>> the Director to do. It just passes off the bootstrap file, which is >>> written then lets the FD and SD do their thing. >> Sorry for not being precise enough in the first round. It is: >> "after I type done" but "before the next line is written" there is 2.5 >> hours. So I guess it is building the bootstrap file? > > Yes, depending on how you have Bacula configured and how many files there > are, > building a bootstrap file can take a lot of time. I imagine that most of the > time is spent in the catalog, but I have never actually measured it. > > The main factors are: > 1. How many total files there are.
3.8m .. where I marked 2.8m. > 2. How many JobIds are involved. 2-5 (5 the other day, but today I got an differential in the loop) > 3. The number of JobMedia records, which depend on the number of JobIds and > the value set for Maximum File Size in the Storage daemon. I dont set the Maximum File Size, so thats the default but I spool to disk and subsequently to tape in 8GB batches. The data to restore is around 200GB. mysql> select count(*) from JobMedia where JobId in (31988,32768); +----------+ | count(*) | +----------+ | 820 | +----------+ 1 row in set (0.01 sec) >>>> Just after done, the system waited for around 2.5 hours before getting >>>> onto the actual restore. Seen from the system side it was pure cpu-load, >>>> having one thread sitting at 100% CPU and absolutly no database-activity >>>> and a decent (not growing) memory usage (~512MB). >>>> >>>> Most of the time it actually never got to done but somehow the thread >>>> taking care of the job just got killed (a watchdog timeout perhaps?) >>> Unless you have set some maximum runtime, the thread should not be >>> killed. >> Strange, I've seen this repeatedly, but it is not severe enough to get >> the director to terminate, just the thread.. nothing else. > > Well, it could either be that your DB engine is very busy or that there are > really huge numbers of files and/or JobMedia records. I have put an strace on it and if it was chatting with the DB I should see it there and on "mysqladmin processlist" on the db. I have strace'd all the sub-threads of bacula-dir. "huge" is relative, but I dont think the above numbers are huge? >>>> I'm still on Bacula 2.4, so just let me know if there has been looked >>>> into this in 3.0. >>> I recommend that you duplicate the problem then trap the Director with >>> the debugger and find out what it is doing (i.e. where it is spending its >>> time). This sounds odd, though it is possible I am overlooking something. >> Is there a short guide on how to do this somewhere? > > The Kaboom chapter of the manual tells you how to run the Director under the > debugger. You can also attach to the Director while it is running, using: > > cd <bacula-binary-directory> > gdb bacula-dir <pid-of-director> I can get it to run, but I'll have to read more documentation to find out where it is actulally looping or similar. Seem that even my bacula is build with -g I dont have any symbols accessible for gdb. (I think it is 5 years since I've toyed around with gdb last time). -- Jesper ------------------------------------------------------------------------------ Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
