On Thursday 13 September 2007 13:37, Martin Simmons wrote:
> >>>>> On Wed, 12 Sep 2007 23:09:17 +0200, Kern Sibbald said:
> >
> > Hello,
> >
> > I believe that I may have a fix for the FD crash shown below. I'll
> > commit it tomorrow as I have another fix for bug #920 that I need to
> > fully test before committing.
>
> Is there a danger that it can terminate without deleting the pid etc now?
> It may be better to call pthread_exit instead of exit when already_here is
> true.
I'm hoping that the true fix is the reordering of some of the termination
calls, although the dump that Dan produced clearly indicated the routine was
possibly called twice (the 05a5a ... argument to a call indicates reuse of a
freed Bacula buffer).
Concerning the use of exit(1): my view is that if we get into that routine
twice, it is best to get out fast. None of the exit routines are critical --
yes not deleting the pid file could be annoying, but normally if we were
already in the file, it should be deleted.
>
> Another minor issue is that lock-less use of variables like already_here is
> not safe on CPUs with speculative execution. It is unlikely to happen in
> this code, but you never know.
Yes, it is possible that we could pass through the routine twice because the
variable is not set inside a mutex. However, we are trying to exit, and it
will be set at some point. The important point of the test is not so much to
prevent calling the routine twice, which should not really cause any harm,
but prevent it from getting into some seg fault infinite loop. IMO, any
additional use of mutexes is dangerous (could cause a deadlock) and overkill
when our main goal at that point is to do a minimal cleanup and exit.
I'm willing to see a patch that could simplify things even further, or would
improve the chances that the pid file is really deleted, but would be *very*
reluctant to accept anything that has additional locking (fear of a
deadlock).
>
> __Martin
>
> > Regards,
> >
> > Kern
> >
> > On Wednesday 12 September 2007 21:15, Martin Simmons wrote:
> > > >>>>> On Wed, 12 Sep 2007 12:00:59 -0400, Dan Langille said:
> > > >
> > > > Priority: normal
> > > > Content-description: Mail message body
> > > >
> > > > After I encounterd repeated cores of bacula-sd, Eric Bollengier
> > > > suggested this patch. It passed my first set of regression tests
> > > > without producing a bacula-sd.core
> > > >
> > > > The traceback appears at the end of this email. Any objections to a
> > > > commit now?
> > > >
> > > > $ svn di
> > > > Index: stored/stored.c
> > > > ===================================================================
> > > > --- stored/stored.c (revision 5534)
> > > > +++ stored/stored.c (working copy)
> > > > @@ -600,10 +600,10 @@
> > > > if (debug_level > 10) {
> > > > print_memory_pool_stats();
> > > > }
> > > > - term_reservations_lock();
> > > > term_msg();
> > > > cleanup_crypto();
> > > > free_volume_list();
> > > > + term_reservations_lock();
> > > > close_memory_pool();
> > > >
> > > > sm_dump(false); /* dump orphaned buffers */
> > > > $
> > >
> > > Looks right, but that won't fix the bacula-fd crash you included below
> > > :-)
> > >
> > > __Martin
> > >
> > > > And FYI:
> > > >
> > > > [EMAIL PROTECTED]:~/src/BaculaRegressionTesting] $ gdb bin/bacula-fd
> > > > bacula-
> > > > fd.core
> > > > GNU gdb 6.1.1 [FreeBSD]
> > > > Copyright 2004 Free Software Foundation, Inc.
> > > > GDB is free software, covered by the GNU General Public License, and
> > > > you are
> > > > welcome to change it and/or distribute copies of it under certain
> > > > conditions.
> > > > Type "show copying" to see the conditions.
> > > > There is absolutely no warranty for GDB. Type "show warranty" for
> > > > details.
> > > > This GDB was configured as "i386-marcel-freebsd"...
> > > > Core was generated by `bacula-fd'.
> > > > Program terminated with signal 11, Segmentation fault.
> > > > Reading symbols from /lib/libz.so.4...done.
> > > > Loaded symbols for /lib/libz.so.4
> > > > Reading symbols from /lib/libthr.so.3...done.
> > > > Loaded symbols for /lib/libthr.so.3
> > > > Reading symbols from /usr/local/lib/libintl.so.8...done.
> > > >
> > > > Loaded symbols for /usr/local/lib/libintl.so.8
> > > >
> > > > Reading symbols from /usr/local/lib/libiconv.so.3...done.
> > > > Loaded symbols for /usr/local/lib/libiconv.so.3
> > > > Reading symbols from /usr/lib/libssl.so.5...done.
> > > > Loaded symbols for /usr/lib/libssl.so.5
> > > > Reading symbols from /lib/libcrypto.so.5...done.
> > > > Loaded symbols for /lib/libcrypto.so.5
> > > > Reading symbols from /usr/lib/libstdc++.so.6...done.
> > > > Loaded symbols for /usr/lib/libstdc++.so.6
> > > > Reading symbols from /lib/libm.so.5...done.
> > > > Loaded symbols for /lib/libm.so.5
> > > > Reading symbols from /lib/libgcc_s.so.1...done.
> > > > Loaded symbols for /lib/libgcc_s.so.1
> > > > Reading symbols from /lib/libc.so.7...done.
> > > > Loaded symbols for /lib/libc.so.7
> > > > Reading symbols from /libexec/ld-elf.so.1...done.
> > > > Loaded symbols for /libexec/ld-elf.so.1
> > > > #0 get_first_port_host_order (addrs=0x5a5a5a5a) at address_conf.h:64
> > > > 64 address_conf.h: No such file or directory.
> > > > in address_conf.h
> > > > [New Thread 0x28601200 (LWP 100058)]
> > > > [New Thread 0x28601100 (LWP 100478)]
> > > > (gdb) backtrace
> > > > #0 get_first_port_host_order (addrs=0x5a5a5a5a) at address_conf.h:64
> > > > #1 0x0804cc93 in terminate_filed (sig=15) at filed.c:238
> > > > #2 0x0807dbf9 in signal_handler (sig=15) at signal.c:180
> > > > #3 0xbfbfffb4 in ?? ()
> > > > #4 0x0000000f in ?? ()
> > > > #5 0x00000000 in ?? ()
> > > > #6 0xbf9febc0 in ?? ()
> > > > #7 0x00000000 in ?? ()
> > > > #8 0x0807d940 in init_stack_dump () at signal.c:190
> > > > #9 0x280e41de in pthread_cond_init () from /lib/libthr.so.3
> > > > #10 0x08081588 in watchdog_thread (arg=0x0) at watchdog.c:307
> > > > #11 0x280dda7f in pthread_getprio () from /lib/libthr.so.3
> > > > #12 0x00000000 in ?? ()
> > > > Current language: auto; currently c++
> > > > (gdb)
> > > >
> > > > --
> > > > Dan Langille - http://www.langille.org/
> > > > Available for hire: http://www.freebsddiary.org/dan_langille.php
> > >
> > > -----------------------------------------------------------------------
> > >-- This SF.net email is sponsored by: Microsoft
> > > Defy all challenges. Microsoft(R) Visual Studio 2005.
> > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > > _______________________________________________
> > > Bacula-devel mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/bacula-devel
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel