Hello,

On Tuesday 28 August 2007 23:27, Martin Simmons wrote:
> Looks like it might be another case of using the va_list pointer more than
> once (if it loops with goto again).  

I think you are right, but it is a slightly different case from the previous 
bug you found.

I've fought my way through years of varargs and finally stdarg dealing with 
all kinds of brain damaged implementations, but this one really takes the 
cake.  

In this case, I am passing a variable of type va_list to another subroutine, 
so following standard C rules, the original variable should never be changed. 
However, it turns out that depending on the compiler, and possibly the 
version of the compiler, the variable is passed by reference, which means 
that it can be damaged.  

Since the implementation of va_list differs radically, one can never know how 
to make a copy of va_list, so ANSI C99 defines va_copy(), which would resolve 
the problem for me, very logical, except that some OSes, such as HP, don't 
seem to have bothered to implement va_copy.

> Unlike the other case, it is probably 
> impossible to fix within bmsg itself, so you need to loop in all of the
> callers.  Today's Lisp reference: ellipsis is not &rest :-)
>
> I don't think the value of format in the backtrace is necessarily a
> problem, because something might be adjusting the parameter within
> vsnprintf, which could show up in the backtrace due to optimizations (value
> in the same register).

I think I will implement va_copy, and a BRAIN_DAMAGED_STDARG tag that we can 
turn on for the dinosaur OSes.

Best regards,

Kern

>
> __Martin
>
> >>>>> On Tue, 28 Aug 2007 07:53:34 +0200, Kern Sibbald said:
> >
> > Hello Dirk,
> >
> > I've looked over the code, and if there is something wrong with it, I am
> > certainly missing it.  Perhaps someone on the devel list will see
> > something that I cannot.
> >
> > At this point, I'm privileging a compiler bug.  Could you give me the
> > following information?
> >
> > 1. The version of the compiler and the architecture for each machine
> > where you have the failure.
> > 2. The version of the compiler and the architecture for each machine
> > where you do not have the failure.
> >
> > Could you give me the complete compile line with all the options for both
> > dird/ua_output.c and lib/bsnprintf.c?  Either edit the Makefile and
> > remove the $(NO_ECHO) in front of the compile rules (the .c.o: and .cc.o:
> > lines), or set the environment variable NO_ECHO to the empty string.
> >
> > Could you set the compile optimization for those two files to -O0 (minus
> > oh zero)?  Either edit the Makefile or set it on the command line via a
> > preceding environment variable setting of CFLAGS.   Then test again and
> > see if it fails.
> >
> > As a separate test, if the above test still fails, could you comment out
> > the #define USE_BSNPRINTF 1
> > line in src/version.h and then rebuild everything?
> >
> > Another interesting test would be to put:
> >
> >    Dmsg1(000, "fmt=%s\n", fmt);
> >
> > just after the line "again:" at line 737 in src/dird/ua_output.c  as well
> > as:
> >
> >    Dmsg0(000, "goto again\n");
> >
> > after "msg = realloc_pool_memory(msg, maxlen + maxlen/2);" at line
> > 741.  Then report what it prints when the seg fault occurs.
> >
> > Best regards,
> >
> > Kern
> >
> > PS: for the list, the problem is clearly (according to the traceback) in
> > Thread 2 between stack frame 4 and 5 where the argument "fmt" in stack
> > frame 5 should be identical to argument "format" in stack frame 4, but
> > has been shifted by 2 bytes!
> >
> > On Monday 27 August 2007 22:57, Dirk H Bartley wrote:
> > > Well, I'm persistent but still not succeeding.  I went back to revision
> > > 5397 which is before you made some changes to prevent the director
> > > segfault.
> > >
> > > I've got -g in the CC flags so the symbol table is now there.  That's
> > > good.
> > >
> > > In my debugging, I have not the knowledge to use gdb to get what I
> > > want.
> > >
> > > All the bad stuff is happening somewhere between frame 7 and frame 4.
> > >
> > > in sql_handler there is the line:
> > > ua->send_msg("%s", rows.c_str());
> > > which rows looks like the correct value.
> > >
> > > which calls
> > > void UAContext::send_msg(const char *fmt, ...)
> > > {
> > >    va_list arg_ptr;
> > >    va_start(arg_ptr, fmt);
> > >    bmsg(this, fmt, arg_ptr);
> > >    va_end(arg_ptr);
> > > }
> > >
> > > where is where I am unable to look at the value any more.  only fmt
> > > gives the "%s" that it should.
> > >
> > > Anyways, I'll try another day, but I can't say that it will work
> > > properly by the weekend.  I may need to be able to toss Ideas back an
> > > forth while I'm doing it to succeed.  My experience with the commands
> > > in gdb is almost null.
> > >
> > > The head does not crash.  It just returns null to the joblog when it
> > > should return the long log of the job actually running.
> > >
> > > Dirk
> > >
> > > Backtrace below with symbol table working.  This is at
> > > svn up -r5397
> > >
> > > Thread 2 (Thread 1098918208 (LWP 20666)):
> > > #0  0x00002ab9b89f4aef in waitpid () from /lib/libpthread.so.0
> > > #1  0x000000000046c643 in signal_handler (sig=11) at signal.c:167
> > > #2  <signal handler called>
> > > #3  0x00002ab9b959bc10 in strlen () from /lib/libc.so.6
> > > #4  0x000000000045a236 in bvsnprintf (buffer=0x6f5d90 "", maxlen=939,
> > > format=0x514736 "", args=0x41801ec0) at bsnprintf.c:412
> > > #5  0x0000000000431f06 in bmsg (ua=0x6e8428, fmt=0x514734 "%s",
> > > arg_ptr=0x41801ec0) at ua_output.c:738 #6  0x0000000000432266 in
> > > UAContext::send_msg (this=0x9, fmt=0x0) at ua_output.c:775 #7
> > > 0x000000000042e3a5 in sql_handler (ctx=0x6e8428, num_field=3,
> > > row=0x6e86b8) at ua_dotcmds.c:479
> > >
> > >
> > > -------- Forwarded Message --------
> > > From: [EMAIL PROTECTED]
> > > To: [EMAIL PROTECTED]
> > > Subject: Bacula GDB traceback of bacula-dir
> > > Date: Mon, 27 Aug 2007 16:17:32 -0500 (EDT)
> > >
> > > Using host libthread_db library "/lib/libthread_db.so.1".
> > > [Thread debugging using libthread_db enabled]
> > > [New Thread 46977176811168 (LWP 20605)]
> > > [New Thread 1098918208 (LWP 20666)]
> > > [New Thread 1090525504 (LWP 20611)]
> > > [New Thread 1082132800 (LWP 20610)]
> > > 0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> > > $1 = "srvalum3-dir", '\0' <repeats 17 times>
> > > $2 = 0x69e088 "bacula-dir"
> > > $3 = 0x69e0c8 "/usr/sbin/bacula-dir"
> > > $4 = 0x6e4638 "PostgreSQL"
> > > $5 = 0x527e38 "2.3.1 (21 August 2007)"
> > > $6 = 0x50bd4b "x86_64-pc-linux-gnu"
> > > $7 = 0x50bd44 "gentoo"
> > > $8 = 0x5155da ""
> > > #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> > > #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> > > #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> > > optimized out>) at scheduler.c:130 #3  0x000000000040dcc0 in main
> > > (argc=0, argv=0x7ffff263e228) at dird.c:285
> > >
> > > Thread 4 (Thread 1082132800 (LWP 20610)):
> > > #0  0x00002ab9b95e23b2 in select () from /lib/libc.so.6
> > > #1  0x000000000045560c in bnet_thread_server (addrs=0x69ea68,
> > > max_clients=20, client_wq=0x697f60, handle_client_request=0x43e4d0
> > > <handle_UA_client_request>) at bnet_server.c:161 #2  0x000000000043e4c8
> > > in connect_thread (arg=0x69ea68) at ua_server.c:84 #3 
> > > 0x00002ab9b89ed135 in start_thread () from /lib/libpthread.so.0 #4 
> > > 0x00002ab9b95e835e in clone () from /lib/libc.so.6
> > > #5  0x0000000000000000 in ?? ()
> > >
> > > Thread 3 (Thread 1090525504 (LWP 20611)):
> > > #0  0x00002ab9b89f1917 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > > /lib/libpthread.so.0 #1  0x00000000004742c5 in watchdog_thread
> > > (arg=<value optimized out>) at watchdog.c:307 #2  0x00002ab9b89ed135 in
> > > start_thread () from /lib/libpthread.so.0 #3  0x00002ab9b95e835e in
> > > clone () from /lib/libc.so.6
> > > #4  0x0000000000000000 in ?? ()
> > >
> > > Thread 2 (Thread 1098918208 (LWP 20666)):
> > > #0  0x00002ab9b89f4aef in waitpid () from /lib/libpthread.so.0
> > > #1  0x000000000046c643 in signal_handler (sig=11) at signal.c:167
> > > #2  <signal handler called>
> > > #3  0x00002ab9b959bc10 in strlen () from /lib/libc.so.6
> > > #4  0x000000000045a236 in bvsnprintf (buffer=0x6f5d90 "", maxlen=939,
> > > format=0x514736 "", args=0x41801ec0) at bsnprintf.c:412
> > > #5  0x0000000000431f06 in bmsg (ua=0x6e8428, fmt=0x514734 "%s",
> > > arg_ptr=0x41801ec0) at ua_output.c:738 #6  0x0000000000432266 in
> > > UAContext::send_msg (this=0x9, fmt=0x0) at ua_output.c:775 #7
> > > 0x000000000042e3a5 in sql_handler (ctx=0x6e8428, num_field=3,
> > > row=0x6e86b8) at ua_dotcmds.c:479 #8  0x0000000000450852 in
> > > db_sql_query (mdb=0x6e8808, query=<value optimized out>,
> > > result_handler=0x42e2b0 <sql_handler>, ctx=0x6e8428) at
> > > postgresql.c:320
> > > #9  0x000000000042dcb7 in sql_cmd (ua=0x6e8428, cmd=<value optimized
> > > out>) at ua_dotcmds.c:494 #10 0x000000000042d9fb in do_a_dot_command
> > > (ua=0x6e8428,
> > >     cmd=0x6f1ef0 ".sql query=\"SELECT LogId, Time, LogText FROM Log
> > > WHERE JobId='2676'\"") at ua_dotcmds.c:131 #11 0x000000000043e6bf in
> > > handle_UA_client_request (arg=<value optimized out>) at ua_server.c:145
> > > #12 0x000000000047487d in workq_server (arg=<value optimized out>) at
> > > workq.c:357 #13 0x00002ab9b89ed135 in start_thread () from
> > > /lib/libpthread.so.0 #14 0x00002ab9b95e835e in clone () from
> > > /lib/libc.so.6 #15 0x0000000000000000 in ?? ()
> > >
> > > Thread 1 (Thread 46977176811168 (LWP 20605)):
> > > #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> > > #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> > > #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> > > optimized out>) at scheduler.c:130 #3  0x000000000040dcc0 in main
> > > (argc=0, argv=0x7ffff263e228) at dird.c:285 #0  0x00002ab9b89f4621 in
> > > ?? () from /lib/libpthread.so.0
> > > #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> > > No symbol table info available.
> > > #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> > > 71           stat = nanosleep(&timeout, NULL);
> > > Current language:  auto; currently c++
> > > timeout = {tv_sec = 60, tv_nsec = 0}
> > > tv = {tv_sec = 26, tv_usec = 4631895}
> > > tz = {tz_minuteswest = 376, tz_dsttime = 0}
> > > stat = <value optimized out>
> > > #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> > > optimized out>) at scheduler.c:130 130         
> > > bmicrosleep(next_check_secs, 0); /* recheck once per minute */ jcr =
> > > <value optimized out>
> > > job = (JOB *) 0x0
> > > run = <value optimized out>
> > > now = <value optimized out>
> > > next_job = <value optimized out>
> > > first = false
> > > #3  0x000000000040dcc0 in main (argc=0, argv=0x7ffff263e228) at
> > > dird.c:285 285       while ( (jcr = wait_for_next_job(runjob)) ) {
> > > ch = <value optimized out>
> > > jcr = (JCR *) 0x7
> > > no_signals = false
> > > test_config = false
> > > uid = 0x7ffff263fdc4 "root"
> > > gid = 0x7ffff263fdcc "bacula"
> > > #0  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > #0  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > #0  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > #0  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >>  http://get.splunk.com/
> > _______________________________________________
> > Bacula-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/bacula-devel
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> Bacula-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to