Kern Sibbald wrote:
> On Sunday 06 December 2009 20:42:19 Jesper Krogh wrote:
>> Kern Sibbald wrote:
>>>> cwd is: /
>>>> $ cd /mnt/backup
>>>> cwd is: /mnt/backup/
>>>> $ mark cache
>>>> 2,872,501 files marked.
>>>> $ done
>>>> Bootstrap records written to /var/lib/bacula/bacula-dir.restore.1.bsr
>>> At that point, as far as I know, there is no more significant work for
>>> the Director to do.  It just passes off the bootstrap file, which is
>>> written then lets the FD and SD do their thing.
>> Sorry for not being precise enough in the first round. It is:
>> "after I type done" but "before the next line is written" there is 2.5
>> hours. So I guess it is building the bootstrap file?
> 
> Yes, depending on how you have Bacula configured and how many files there 
> are, 
> building a bootstrap file can take a lot of time.  I imagine that most of the 
> time is spent in the catalog, but I have never actually measured it.
> 
> The main factors are:
> 1. How many total files there are.

3.8m .. where I marked 2.8m.

> 2. How many JobIds are involved.

2-5 (5 the other day, but today I got an differential in the loop)

> 3. The number of JobMedia records, which depend on the number of JobIds and 
> the value set for Maximum File Size in the Storage daemon.

I dont set the Maximum File Size, so thats the default but I spool to
disk and subsequently to tape in 8GB batches. The data to restore is
around 200GB.

mysql> select count(*) from JobMedia where JobId in (31988,32768);
+----------+
| count(*) |
+----------+
|      820 |
+----------+
1 row in set (0.01 sec)


>>>> Just after done, the system waited for around 2.5 hours before getting
>>>> onto the actual restore. Seen from the system side it was pure cpu-load,
>>>> having one thread sitting at 100% CPU and absolutly no database-activity
>>>> and a decent (not growing) memory usage (~512MB).
>>>>
>>>> Most of the time it actually never got to done but somehow the thread
>>>> taking care of the job just got killed (a watchdog timeout perhaps?)
>>> Unless you have set some maximum runtime, the thread should not be
>>> killed.
>> Strange, I've seen this repeatedly, but it is not severe enough to get
>> the director to terminate, just the thread.. nothing else.
> 
> Well, it could either be that your DB engine is very busy or that there are 
> really huge numbers of files and/or JobMedia records.

I have put an strace on it and if it was chatting with the DB I should
see it there and on "mysqladmin processlist" on the db. I have strace'd
all the sub-threads of bacula-dir.

"huge" is relative, but I dont think the above numbers are huge?

>>>> I'm still on Bacula 2.4, so just let me know if there has been looked
>>>> into this in 3.0.
>>> I recommend that you duplicate the problem then trap the Director with
>>> the debugger and find out what it is doing (i.e. where it is spending its
>>> time). This sounds odd, though it is possible I am overlooking something.
>> Is there a short guide on how to do this somewhere?
> 
> The Kaboom chapter of the manual tells you how to run the Director under the 
> debugger.  You can also attach to the Director while it is running, using:
> 
>   cd <bacula-binary-directory>
>   gdb bacula-dir <pid-of-director>

I can get it to run, but I'll have to read more documentation to find
out where it is actulally looping or similar. Seem that even my bacula
is build with -g I dont have any symbols accessible for gdb. (I think it
is 5 years since I've toyed around with gdb last time).

-- 
Jesper

------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to