On Thursday 31 December 2009 14:06:26 Martin Simmons wrote:
> >>>>> On Wed, 30 Dec 2009 09:44:49 +0100, Jesper Krogh said:
> >
> > Martin Simmons wrote:
> > >>>>>> On Tue, 29 Dec 2009 11:05:09 +0100, Jesper Krogh said:
> > >>
> > >> Kern Sibbald wrote:
> > >>> The Kaboom chapter of the manual tells you how to run the Director
> > >>> under the debugger. You can also attach to the Director while it is
> > >>> running, using:
> > >>>
> > >>> cd <bacula-binary-directory>
> > >>> gdb bacula-dir <pid-of-director>
> > >>
> > >> A small month, the problem is still present.. takes hours to get "from
> > >> done" and to actual restore starts. I've managed to get a backtrace:
> > >>
> > >> Thread 2 (Thread 0x42767950 (LWP 10832)):
> > >> #0 0x000000000040a4a8 in add_findex (bsr=0x6dd468, JobId=32927,
> > >> findex=132808) at bsr.c:554
> > >> #1 0x0000000000432965 in restore_cmd (ua=0x6dcb08, cmd=<value
> > >> optimized out>) at ua_restore.c:1094
> > >> #2 0x0000000000425e56 in do_a_command (ua=0x6dcb08, cmd=0x6d5b10 "1")
> > >> at ua_cmds.c:180
> > >> #3 0x0000000000438781 in handle_UA_client_request (arg=<value
> > >> optimized out>) at ua_server.c:147
> > >> #4 0x000000000046cb8b in workq_server (arg=<value optimized out>) at
> > >> workq.c:357
> > >> #5 0x00007f233e9553f7 in start_thread () from /lib/libpthread.so.0
> > >> #6 0x00007f233db1db4d in clone () from /lib/libc.so.6
> > >> #7 0x0000000000000000 in ?? ()
> > >>
> > >> Repeating this with "continue/interrrupt" gives the same trace but
> > >> with different findex= values.
> > >>
> > >> The restore block looks like this:
> > >>
> > >> +--------+-------+-----------+-----------------+---------------------+
> > >>------------+
> > >>
> > >> | JobId | Level | JobFiles | JobBytes | StartTime |
> > >>
> > >> VolumeName |
> > >> +--------+-------+-----------+-----------------+---------------------+
> > >>------------+
> > >>
> > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 |
> > >>
> > >> 000779L3 |
> > >>
> > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 |
> > >>
> > >> 000789L3 |
> > >>
> > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 |
> > >>
> > >> 000804L3 |
> > >>
> > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 |
> > >>
> > >> 001805L3 |
> > >>
> > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 |
> > >>
> > >> 001806L3 |
> > >>
> > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 |
> > >>
> > >> 001807L3 |
> > >>
> > >> | 33,446 | D | 136,256 | 50,695,957,124 | 2009-12-28 08:01:50 |
> > >>
> > >> 004048L4 |
> > >>
> > >> | 33,473 | I | 1,224 | 16,023,974,683 | 2009-12-28 14:41:19 |
> > >>
> > >> 004059L4 |
> > >>
> > >> | 33,501 | I | 11,188 | 24,448,676,227 | 2009-12-29 01:40:23 |
> > >>
> > >> 004059L4 |
> > >> +--------+-------+-----------+-----------------+---------------------+
> > >>------------+
> > >>
> > >> I'm on 2.4.3 and the bsr.c:554 is
> > >>
> > >> /* Walk down fi chain and find where to insert insert new FileIndex
> > >> */ for ( ; fi; fi=fi->next) {
> > >> if (findex == (fi->findex2 + 1)) { /* extend up */
> > >> RBSR_FINDEX *nfi;
> > >> fi->findex2 = findex;
> > >>
> > >> It I get some more time I'll try to add debug information to find out
> > >> where it's actually looping. Suggestions are certainly welcome.
> > >
> > > It might be a variant of this problem:
> > >
> > > http://article.gmane.org/gmane.comp.bacula.user/54164/match=add%5ffinde
> > >x
> >
> > It looks quite a lot like the same problem. But I did a diff of the
> > bsr.c of the freshest one with the 2.4.3 one and there are not changes.
> > Since an upgrade is non-reversibel I would prefer not to be "forced" to
> > do it but take it at a time where I had sufficient amount of testing
> > time.
> >
> > Can you point to the changes that are supposed to deal with the problem?
>
> AFAIK, bsr.c didn't change. The fix was in the code that builds the tree,
> which now sorts the items by FileIndex. The function db_get_file_list was
> added and is called by build_directory_tree in ua_restore.c.
Yes, and in addition, I forgot at which point, but we also modified the in
memory tree from linked lists to red-black binary trees.
In general, unless you have a huge number of files in the set being prepared,
building the bsr should not really take a long time. Sorting out the
JobMedia index records *is* very compute intensive, but in general, there
should not be too many of them. If you are finding that there are a lot of
JobMedia records, then perhaps someone has poorly configured the "Maximum
File Size" directive in the Storage daemon Device resource. It should be set
to a minimum of 1G, and if you have either an LTO-4 or a huge number of files
in the FileSet, then it probably should be set at 5G.
The manual has a bit on configuring the Maximum File Size ...
Regards,
Kern
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel