On Sun, Dec 01, 2019 at 11:57:08PM +0100, Klemens Nanni wrote:

> On Sun, Dec 01, 2019 at 11:41:51PM +0100, Klemens Nanni wrote:
> > It crashed again:
> > 
> >     Dec  1 23:36:50 eru unwind[88042]: startup
> >     Dec  1 23:38:18 eru unwind[44830]: frontend exiting
> >     Dec  1 23:38:18 eru unwind[88042]: resolver terminated; signal 11
> >     Dec  1 23:38:18 eru unwind[88042]: terminating
> Sorry, this is the log from the crash that happened right after I
> restarted unwind, this time while being in the Air Canada lounge.
> 
> Here's the log that corresponds to the backtrace from the previous mail:
> 
>       Dec  1 19:22:50 eru unwind[44200]: startup
>       Dec  1 21:43:42 eru unwind[38187]: frontend exiting
>       Dec  1 21:43:42 eru unwind[57687]: resolver terminated; signal 11
>       Dec  1 21:43:42 eru unwind[57687]: terminating
> 
> As you can see, it's the same crash;  looking at the newest crash
> (from 23:38:18) one can see that it is the exact same code path:
> 
>       Program terminated with signal SIGSEGV, Segmentation fault.
>       #0  try_next_resolver (rq=0x15eba8268200) at 
> /s/sbin/unwind/resolver.c:788
>       788             rq->running++;
>       (gdb) bt
>       #0  try_next_resolver (rq=0x15eba8268200) at 
> /s/sbin/unwind/resolver.c:788
>       #1  0x000015e9782eb0ba in setup_query (query_imsg=<optimized out>) at 
> /s/sbin/unwind/resolver.c:715
>       #2  0x000015e9782ea974 in resolver_dispatch_frontend (fd=<optimized 
> out>, event=<optimized out>, bula=0x15ec61a0d000) at 
> /s/sbin/unwind/resolver.c:483
>       #3  0x000015e97838ac5f in event_process_active (base=<optimized out>) 
> at /usr/src/lib/libevent/event.c:334
>       #4  event_base_loop (base=0x15ec4e373000, flags=0) at 
> /usr/src/lib/libevent/event.c:483
>       #5  0x000015e9782e9c56 in resolver (debug=<optimized out>, 
> verbose=<optimized out>) at /s/sbin/unwind/resolver.c:383
>       #6  0x000015e9782f2ce2 in main (argc=0, argv=0x7f7fffff1588) at 
> /s/sbin/unwind/unwind.c:173
>       (gdb) l
>       783             evtimer_add(&rq->timer_ev, &tv);
>       784     
>       785             if (resolve(res, query_imsg->qname, query_imsg->t,
>       786                 query_imsg->c, query_imsg, resolve_done) != 0)
>       787                     goto err;
>       788             rq->running++;
>       789     
>       790             return 0;
>       791     
>       792      err:
>       (gdb) p rq
>       $1 = (struct running_query *) 0x15eba8268200
>       (gdb) p *rq
>       Cannot access memory at address 0x15eba8268200
> 
> So why does evtimer_add() work on derefencing `rq' but the increment
> fails, even though resolve() in between apparently does not touch it?
> 
> While sitting at the lounge, I'm running unwind with DEBUG='-g3 -O0' to
> get a better backtrace.

Try this,

        -Otto

Index: resolver.c
===================================================================
RCS file: /cvs/src/sbin/unwind/resolver.c,v
retrieving revision 1.88
diff -u -p -r1.88 resolver.c
--- resolver.c  2 Dec 2019 06:26:52 -0000       1.88
+++ resolver.c  2 Dec 2019 08:44:08 -0000
@@ -782,10 +782,10 @@ try_next_resolver(struct running_query *
        }
        evtimer_add(&rq->timer_ev, &tv);
 
+       rq->running++;
        if (resolve(res, query_imsg->qname, query_imsg->t,
            query_imsg->c, query_imsg, resolve_done) != 0)
                goto err;
-       rq->running++;
 
        return 0;
 

Reply via email to