Thanks again for the tip and the patch :-)

Regards,

Kern

On Tuesday 28 August 2007 23:27, Martin Simmons wrote:
> Looks like it might be another case of using the va_list pointer more than
> once (if it loops with goto again).  Unlike the other case, it is probably
> impossible to fix within bmsg itself, so you need to loop in all of the
> callers.  Today's Lisp reference: ellipsis is not &rest :-)
>
> I don't think the value of format in the backtrace is necessarily a
> problem, because something might be adjusting the parameter within
> vsnprintf, which could show up in the backtrace due to optimizations (value
> in the same register).
>
> __Martin
>
> >>>>> On Tue, 28 Aug 2007 07:53:34 +0200, Kern Sibbald said:
> >
> > Hello Dirk,
> >
> > I've looked over the code, and if there is something wrong with it, I am
> > certainly missing it.  Perhaps someone on the devel list will see
> > something that I cannot.
> >
> > At this point, I'm privileging a compiler bug.  Could you give me the
> > following information?
> >
> > 1. The version of the compiler and the architecture for each machine
> > where you have the failure.
> > 2. The version of the compiler and the architecture for each machine
> > where you do not have the failure.
> >
> > Could you give me the complete compile line with all the options for both
> > dird/ua_output.c and lib/bsnprintf.c?  Either edit the Makefile and
> > remove the $(NO_ECHO) in front of the compile rules (the .c.o: and .cc.o:
> > lines), or set the environment variable NO_ECHO to the empty string.
> >
> > Could you set the compile optimization for those two files to -O0 (minus
> > oh zero)?  Either edit the Makefile or set it on the command line via a
> > preceding environment variable setting of CFLAGS.   Then test again and
> > see if it fails.
> >
> > As a separate test, if the above test still fails, could you comment out
> > the #define USE_BSNPRINTF 1
> > line in src/version.h and then rebuild everything?
> >
> > Another interesting test would be to put:
> >
> >    Dmsg1(000, "fmt=%s\n", fmt);
> >
> > just after the line "again:" at line 737 in src/dird/ua_output.c  as well
> > as:
> >
> >    Dmsg0(000, "goto again\n");
> >
> > after "msg = realloc_pool_memory(msg, maxlen + maxlen/2);" at line
> > 741.  Then report what it prints when the seg fault occurs.
> >
> > Best regards,
> >
> > Kern
> >
> > PS: for the list, the problem is clearly (according to the traceback) in
> > Thread 2 between stack frame 4 and 5 where the argument "fmt" in stack
> > frame 5 should be identical to argument "format" in stack frame 4, but
> > has been shifted by 2 bytes!
> >
> > On Monday 27 August 2007 22:57, Dirk H Bartley wrote:
> > > Well, I'm persistent but still not succeeding.  I went back to revision
> > > 5397 which is before you made some changes to prevent the director
> > > segfault.
> > >
> > > I've got -g in the CC flags so the symbol table is now there.  That's
> > > good.
> > >
> > > In my debugging, I have not the knowledge to use gdb to get what I
> > > want.
> > >
> > > All the bad stuff is happening somewhere between frame 7 and frame 4.
> > >
> > > in sql_handler there is the line:
> > > ua->send_msg("%s", rows.c_str());
> > > which rows looks like the correct value.
> > >
> > > which calls
> > > void UAContext::send_msg(const char *fmt, ...)
> > > {
> > >    va_list arg_ptr;
> > >    va_start(arg_ptr, fmt);
> > >    bmsg(this, fmt, arg_ptr);
> > >    va_end(arg_ptr);
> > > }
> > >
> > > where is where I am unable to look at the value any more.  only fmt
> > > gives the "%s" that it should.
> > >
> > > Anyways, I'll try another day, but I can't say that it will work
> > > properly by the weekend.  I may need to be able to toss Ideas back an
> > > forth while I'm doing it to succeed.  My experience with the commands
> > > in gdb is almost null.
> > >
> > > The head does not crash.  It just returns null to the joblog when it
> > > should return the long log of the job actually running.
> > >
> > > Dirk
> > >
> > > Backtrace below with symbol table working.  This is at
> > > svn up -r5397
> > >
> > > Thread 2 (Thread 1098918208 (LWP 20666)):
> > > #0  0x00002ab9b89f4aef in waitpid () from /lib/libpthread.so.0
> > > #1  0x000000000046c643 in signal_handler (sig=11) at signal.c:167
> > > #2  <signal handler called>
> > > #3  0x00002ab9b959bc10 in strlen () from /lib/libc.so.6
> > > #4  0x000000000045a236 in bvsnprintf (buffer=0x6f5d90 "", maxlen=939,
> > > format=0x514736 "", args=0x41801ec0) at bsnprintf.c:412
> > > #5  0x0000000000431f06 in bmsg (ua=0x6e8428, fmt=0x514734 "%s",
> > > arg_ptr=0x41801ec0) at ua_output.c:738 #6  0x0000000000432266 in
> > > UAContext::send_msg (this=0x9, fmt=0x0) at ua_output.c:775 #7
> > > 0x000000000042e3a5 in sql_handler (ctx=0x6e8428, num_field=3,
> > > row=0x6e86b8) at ua_dotcmds.c:479
> > >
> > >
> > > -------- Forwarded Message --------
> > > From: [EMAIL PROTECTED]
> > > To: [EMAIL PROTECTED]
> > > Subject: Bacula GDB traceback of bacula-dir
> > > Date: Mon, 27 Aug 2007 16:17:32 -0500 (EDT)
> > >
> > > Using host libthread_db library "/lib/libthread_db.so.1".
> > > [Thread debugging using libthread_db enabled]
> > > [New Thread 46977176811168 (LWP 20605)]
> > > [New Thread 1098918208 (LWP 20666)]
> > > [New Thread 1090525504 (LWP 20611)]
> > > [New Thread 1082132800 (LWP 20610)]
> > > 0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> > > $1 = "srvalum3-dir", '\0' <repeats 17 times>
> > > $2 = 0x69e088 "bacula-dir"
> > > $3 = 0x69e0c8 "/usr/sbin/bacula-dir"
> > > $4 = 0x6e4638 "PostgreSQL"
> > > $5 = 0x527e38 "2.3.1 (21 August 2007)"
> > > $6 = 0x50bd4b "x86_64-pc-linux-gnu"
> > > $7 = 0x50bd44 "gentoo"
> > > $8 = 0x5155da ""
> > > #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> > > #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> > > #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> > > optimized out>) at scheduler.c:130 #3  0x000000000040dcc0 in main
> > > (argc=0, argv=0x7ffff263e228) at dird.c:285
> > >
> > > Thread 4 (Thread 1082132800 (LWP 20610)):
> > > #0  0x00002ab9b95e23b2 in select () from /lib/libc.so.6
> > > #1  0x000000000045560c in bnet_thread_server (addrs=0x69ea68,
> > > max_clients=20, client_wq=0x697f60, handle_client_request=0x43e4d0
> > > <handle_UA_client_request>) at bnet_server.c:161 #2  0x000000000043e4c8
> > > in connect_thread (arg=0x69ea68) at ua_server.c:84 #3 
> > > 0x00002ab9b89ed135 in start_thread () from /lib/libpthread.so.0 #4 
> > > 0x00002ab9b95e835e in clone () from /lib/libc.so.6
> > > #5  0x0000000000000000 in ?? ()
> > >
> > > Thread 3 (Thread 1090525504 (LWP 20611)):
> > > #0  0x00002ab9b89f1917 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > > /lib/libpthread.so.0 #1  0x00000000004742c5 in watchdog_thread
> > > (arg=<value optimized out>) at watchdog.c:307 #2  0x00002ab9b89ed135 in
> > > start_thread () from /lib/libpthread.so.0 #3  0x00002ab9b95e835e in
> > > clone () from /lib/libc.so.6
> > > #4  0x0000000000000000 in ?? ()
> > >
> > > Thread 2 (Thread 1098918208 (LWP 20666)):
> > > #0  0x00002ab9b89f4aef in waitpid () from /lib/libpthread.so.0
> > > #1  0x000000000046c643 in signal_handler (sig=11) at signal.c:167
> > > #2  <signal handler called>
> > > #3  0x00002ab9b959bc10 in strlen () from /lib/libc.so.6
> > > #4  0x000000000045a236 in bvsnprintf (buffer=0x6f5d90 "", maxlen=939,
> > > format=0x514736 "", args=0x41801ec0) at bsnprintf.c:412
> > > #5  0x0000000000431f06 in bmsg (ua=0x6e8428, fmt=0x514734 "%s",
> > > arg_ptr=0x41801ec0) at ua_output.c:738 #6  0x0000000000432266 in
> > > UAContext::send_msg (this=0x9, fmt=0x0) at ua_output.c:775 #7
> > > 0x000000000042e3a5 in sql_handler (ctx=0x6e8428, num_field=3,
> > > row=0x6e86b8) at ua_dotcmds.c:479 #8  0x0000000000450852 in
> > > db_sql_query (mdb=0x6e8808, query=<value optimized out>,
> > > result_handler=0x42e2b0 <sql_handler>, ctx=0x6e8428) at
> > > postgresql.c:320
> > > #9  0x000000000042dcb7 in sql_cmd (ua=0x6e8428, cmd=<value optimized
> > > out>) at ua_dotcmds.c:494 #10 0x000000000042d9fb in do_a_dot_command
> > > (ua=0x6e8428,
> > >     cmd=0x6f1ef0 ".sql query=\"SELECT LogId, Time, LogText FROM Log
> > > WHERE JobId='2676'\"") at ua_dotcmds.c:131 #11 0x000000000043e6bf in
> > > handle_UA_client_request (arg=<value optimized out>) at ua_server.c:145
> > > #12 0x000000000047487d in workq_server (arg=<value optimized out>) at
> > > workq.c:357 #13 0x00002ab9b89ed135 in start_thread () from
> > > /lib/libpthread.so.0 #14 0x00002ab9b95e835e in clone () from
> > > /lib/libc.so.6 #15 0x0000000000000000 in ?? ()
> > >
> > > Thread 1 (Thread 46977176811168 (LWP 20605)):
> > > #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> > > #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> > > #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> > > optimized out>) at scheduler.c:130 #3  0x000000000040dcc0 in main
> > > (argc=0, argv=0x7ffff263e228) at dird.c:285 #0  0x00002ab9b89f4621 in
> > > ?? () from /lib/libpthread.so.0
> > > #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> > > No symbol table info available.
> > > #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> > > 71           stat = nanosleep(&timeout, NULL);
> > > Current language:  auto; currently c++
> > > timeout = {tv_sec = 60, tv_nsec = 0}
> > > tv = {tv_sec = 26, tv_usec = 4631895}
> > > tz = {tz_minuteswest = 376, tz_dsttime = 0}
> > > stat = <value optimized out>
> > > #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> > > optimized out>) at scheduler.c:130 130         
> > > bmicrosleep(next_check_secs, 0); /* recheck once per minute */ jcr =
> > > <value optimized out>
> > > job = (JOB *) 0x0
> > > run = <value optimized out>
> > > now = <value optimized out>
> > > next_job = <value optimized out>
> > > first = false
> > > #3  0x000000000040dcc0 in main (argc=0, argv=0x7ffff263e228) at
> > > dird.c:285 285       while ( (jcr = wait_for_next_job(runjob)) ) {
> > > ch = <value optimized out>
> > > jcr = (JCR *) 0x7
> > > no_signals = false
> > > test_config = false
> > > uid = 0x7ffff263fdc4 "root"
> > > gid = 0x7ffff263fdcc "bacula"
> > > #0  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > #0  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > #0  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > #0  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >>  http://get.splunk.com/
> > _______________________________________________
> > Bacula-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/bacula-devel
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> Bacula-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to