Hello Dirk,

I've looked over the code, and if there is something wrong with it, I am 
certainly missing it.  Perhaps someone on the devel list will see something 
that I cannot.

At this point, I'm privileging a compiler bug.  Could you give me the 
following information?

1. The version of the compiler and the architecture for each machine where 
    you have the failure.
2. The version of the compiler and the architecture for each machine where
    you do not have the failure.

Could you give me the complete compile line with all the options for both 
dird/ua_output.c and lib/bsnprintf.c?  Either edit the Makefile and remove 
the $(NO_ECHO) in front of the compile rules (the .c.o: and .cc.o: lines), or 
set the environment variable NO_ECHO to the empty string.

Could you set the compile optimization for those two files to -O0 (minus oh 
zero)?  Either edit the Makefile or set it on the command line via a 
preceding environment variable setting of CFLAGS.   Then test again and see 
if it fails.

As a separate test, if the above test still fails, could you comment out the
  #define USE_BSNPRINTF 1
line in src/version.h and then rebuild everything?

Another interesting test would be to put:

   Dmsg1(000, "fmt=%s\n", fmt);

just after the line "again:" at line 737 in src/dird/ua_output.c  as well as:

   Dmsg0(000, "goto again\n");

after "msg = realloc_pool_memory(msg, maxlen + maxlen/2);" at line
741.  Then report what it prints when the seg fault occurs.

Best regards,

Kern

PS: for the list, the problem is clearly (according to the traceback) in 
Thread 2 between stack frame 4 and 5 where the argument "fmt" in stack frame 
5 should be identical to argument "format" in stack frame 4, but has been 
shifted by 2 bytes!


On Monday 27 August 2007 22:57, Dirk H Bartley wrote:
> Well, I'm persistent but still not succeeding.  I went back to revision
> 5397 which is before you made some changes to prevent the director
> segfault.
>
> I've got -g in the CC flags so the symbol table is now there.  That's
> good.
>
> In my debugging, I have not the knowledge to use gdb to get what I want.
>
> All the bad stuff is happening somewhere between frame 7 and frame 4.
>
> in sql_handler there is the line:
> ua->send_msg("%s", rows.c_str());
> which rows looks like the correct value.
>
> which calls
> void UAContext::send_msg(const char *fmt, ...)
> {
>    va_list arg_ptr;
>    va_start(arg_ptr, fmt);
>    bmsg(this, fmt, arg_ptr);
>    va_end(arg_ptr);
> }
>
> where is where I am unable to look at the value any more.  only fmt
> gives the "%s" that it should.
>
> Anyways, I'll try another day, but I can't say that it will work
> properly by the weekend.  I may need to be able to toss Ideas back an
> forth while I'm doing it to succeed.  My experience with the commands in
> gdb is almost null.
>
> The head does not crash.  It just returns null to the joblog when it
> should return the long log of the job actually running.
>
> Dirk
>
> Backtrace below with symbol table working.  This is at
> svn up -r5397
>
> Thread 2 (Thread 1098918208 (LWP 20666)):
> #0  0x00002ab9b89f4aef in waitpid () from /lib/libpthread.so.0
> #1  0x000000000046c643 in signal_handler (sig=11) at signal.c:167
> #2  <signal handler called>
> #3  0x00002ab9b959bc10 in strlen () from /lib/libc.so.6
> #4  0x000000000045a236 in bvsnprintf (buffer=0x6f5d90 "", maxlen=939,
> format=0x514736 "", args=0x41801ec0) at bsnprintf.c:412
> #5  0x0000000000431f06 in bmsg (ua=0x6e8428, fmt=0x514734 "%s",
> arg_ptr=0x41801ec0) at ua_output.c:738 #6  0x0000000000432266 in
> UAContext::send_msg (this=0x9, fmt=0x0) at ua_output.c:775 #7 
> 0x000000000042e3a5 in sql_handler (ctx=0x6e8428, num_field=3, row=0x6e86b8)
> at ua_dotcmds.c:479
>
>
> -------- Forwarded Message --------
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Bacula GDB traceback of bacula-dir
> Date: Mon, 27 Aug 2007 16:17:32 -0500 (EDT)
>
> Using host libthread_db library "/lib/libthread_db.so.1".
> [Thread debugging using libthread_db enabled]
> [New Thread 46977176811168 (LWP 20605)]
> [New Thread 1098918208 (LWP 20666)]
> [New Thread 1090525504 (LWP 20611)]
> [New Thread 1082132800 (LWP 20610)]
> 0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> $1 = "srvalum3-dir", '\0' <repeats 17 times>
> $2 = 0x69e088 "bacula-dir"
> $3 = 0x69e0c8 "/usr/sbin/bacula-dir"
> $4 = 0x6e4638 "PostgreSQL"
> $5 = 0x527e38 "2.3.1 (21 August 2007)"
> $6 = 0x50bd4b "x86_64-pc-linux-gnu"
> $7 = 0x50bd44 "gentoo"
> $8 = 0x5155da ""
> #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> optimized out>) at scheduler.c:130 #3  0x000000000040dcc0 in main (argc=0,
> argv=0x7ffff263e228) at dird.c:285
>
> Thread 4 (Thread 1082132800 (LWP 20610)):
> #0  0x00002ab9b95e23b2 in select () from /lib/libc.so.6
> #1  0x000000000045560c in bnet_thread_server (addrs=0x69ea68,
> max_clients=20, client_wq=0x697f60, handle_client_request=0x43e4d0
> <handle_UA_client_request>) at bnet_server.c:161 #2  0x000000000043e4c8 in
> connect_thread (arg=0x69ea68) at ua_server.c:84 #3  0x00002ab9b89ed135 in
> start_thread () from /lib/libpthread.so.0 #4  0x00002ab9b95e835e in clone
> () from /lib/libc.so.6
> #5  0x0000000000000000 in ?? ()
>
> Thread 3 (Thread 1090525504 (LWP 20611)):
> #0  0x00002ab9b89f1917 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0 #1  0x00000000004742c5 in watchdog_thread (arg=<value
> optimized out>) at watchdog.c:307 #2  0x00002ab9b89ed135 in start_thread ()
> from /lib/libpthread.so.0 #3  0x00002ab9b95e835e in clone () from
> /lib/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 2 (Thread 1098918208 (LWP 20666)):
> #0  0x00002ab9b89f4aef in waitpid () from /lib/libpthread.so.0
> #1  0x000000000046c643 in signal_handler (sig=11) at signal.c:167
> #2  <signal handler called>
> #3  0x00002ab9b959bc10 in strlen () from /lib/libc.so.6
> #4  0x000000000045a236 in bvsnprintf (buffer=0x6f5d90 "", maxlen=939,
> format=0x514736 "", args=0x41801ec0) at bsnprintf.c:412
> #5  0x0000000000431f06 in bmsg (ua=0x6e8428, fmt=0x514734 "%s",
> arg_ptr=0x41801ec0) at ua_output.c:738 #6  0x0000000000432266 in
> UAContext::send_msg (this=0x9, fmt=0x0) at ua_output.c:775 #7 
> 0x000000000042e3a5 in sql_handler (ctx=0x6e8428, num_field=3, row=0x6e86b8)
> at ua_dotcmds.c:479 #8  0x0000000000450852 in db_sql_query (mdb=0x6e8808,
> query=<value optimized out>, result_handler=0x42e2b0 <sql_handler>,
> ctx=0x6e8428) at postgresql.c:320
> #9  0x000000000042dcb7 in sql_cmd (ua=0x6e8428, cmd=<value optimized out>)
> at ua_dotcmds.c:494 #10 0x000000000042d9fb in do_a_dot_command
> (ua=0x6e8428,
>     cmd=0x6f1ef0 ".sql query=\"SELECT LogId, Time, LogText FROM Log WHERE
> JobId='2676'\"") at ua_dotcmds.c:131 #11 0x000000000043e6bf in
> handle_UA_client_request (arg=<value optimized out>) at ua_server.c:145 #12
> 0x000000000047487d in workq_server (arg=<value optimized out>) at
> workq.c:357 #13 0x00002ab9b89ed135 in start_thread () from
> /lib/libpthread.so.0 #14 0x00002ab9b95e835e in clone () from /lib/libc.so.6
> #15 0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 46977176811168 (LWP 20605)):
> #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> optimized out>) at scheduler.c:130 #3  0x000000000040dcc0 in main (argc=0,
> argv=0x7ffff263e228) at dird.c:285 #0  0x00002ab9b89f4621 in ?? () from
> /lib/libpthread.so.0
> #0  0x00002ab9b89f4621 in ?? () from /lib/libpthread.so.0
> No symbol table info available.
> #1  0x0000000000453db3 in bmicrosleep (sec=60, usec=0) at bsys.c:71
> 71       stat = nanosleep(&timeout, NULL);
> Current language:  auto; currently c++
> timeout = {tv_sec = 60, tv_nsec = 0}
> tv = {tv_sec = 26, tv_usec = 4631895}
> tz = {tz_minuteswest = 376, tz_dsttime = 0}
> stat = <value optimized out>
> #2  0x000000000042990a in wait_for_next_job (one_shot_job_to_run=<value
> optimized out>) at scheduler.c:130 130              
> bmicrosleep(next_check_secs,
> 0); /* recheck once per minute */ jcr = <value optimized out>
> job = (JOB *) 0x0
> run = <value optimized out>
> now = <value optimized out>
> next_job = <value optimized out>
> first = false
> #3  0x000000000040dcc0 in main (argc=0, argv=0x7ffff263e228) at dird.c:285
> 285      while ( (jcr = wait_for_next_job(runjob)) ) {
> ch = <value optimized out>
> jcr = (JCR *) 0x7
> no_signals = false
> test_config = false
> uid = 0x7ffff263fdc4 "root"
> gid = 0x7ffff263fdcc "bacula"
> #0  0x0000000000000000 in ?? ()
> No symbol table info available.
> #0  0x0000000000000000 in ?? ()
> No symbol table info available.
> #0  0x0000000000000000 in ?? ()
> No symbol table info available.
> #0  0x0000000000000000 in ?? ()
> No symbol table info available.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to