Re: Help with debugging a crash on a BLE gateway application

Pritish Gandhi Wed, 19 Apr 2017 10:27:40 -0700

Hi Marko,
You're right, that looks eerily similar to what I'm seeing (MYNEWT-656
does). Yeah my next task it to upgrade to the latest OS version.
Thanks again,
Pritish


On Wed, Apr 19, 2017 at 10:21 AM, marko kiiskila <[email protected]> wrote:

>
> > On Apr 19, 2017, at 10:06 AM, Pritish Gandhi <[email protected]>
> wrote:
> >
> > I'm sorry I forgot to give more details about my setup and architecture.
> >
> > I'm running this on an STM32F4Discovery EVB (so STM32F407VG). It can't be
> > the issue with the controller stack, since I'm not using the controller.
> In
> > my setup the STM32F4Discovery is driving an externally connected Broadcom
> > BT controller over UART HCI. So I'm only building the HCI component along
> > with the UART transport.
> >
> > Looking at the STM32F407VG data sheet:
> > ICSR:0x0440f803: Seems to signal that Systick interrupt is pending (not
> > interesting)
> > HFSR:0x40000000: Forced Hard Fault
> > CFSR:0x00000400: Imprecise Error
> >
> > I have 2 application threads running. A BLE Gateway thread (Priority 1)
> and
> > a DEMO thread (Priority 2). I have checked both stacks and they seem to
> > have plenty of room to grow (I see 0xdeadbeef as you suggested).
> >
> > I wonder whether either:
> > a) I took an interrupt which trashed my BLE Gateway stack (Since the
> > os_membuf_copydata()->memcpy() happened on that thread).
> >
> > b) Somehow the count/offset in os_membuf_copydata() is going negative
> > causing me to trash my own stack.
> >
> > I've added some asserts there to make sure that isn't happening.
> >
> > I have a couple other questions.
> > 1) How to get information through GDB on what are all the threads running
> > on the system?
> > I tried (gdb) info threads : But that only showed me a single thread
> (maybe
> > the one that was currently running)
>
> This would need modifying openocd to support mynewt as OS. Which we
> have not tried doing yet.
> However, take a look at gdb scripts under compiler/gdbmacros.
> Specifically file os.gdb. ‘os_tasks’ will list the tasks.
>
> > 2) It doesn't seem like the MPU is turned on for this platform. Is that
> > correct?
>
> Correct. MPUs have not been tackled yet.
>
> > Another note to add (kinda important) is that I'm running myNewt version
> > 0.9.9. I haven't upgraded (yet!).
>
> There’s been some bug fixes since then which might be of interest.
> https://issues.apache.org/jira/browse/MYNEWT-656 <
> https://issues.apache.org/jira/browse/MYNEWT-656> specifically looks
> like it might be a match.
>
> Later,
> M
>
>
> > Thanks,
> > Pritish
> >
> >
> >
> > On Wed, Apr 19, 2017 at 7:59 AM, will sanfilippo <[email protected]>
> wrote:
> >
> >> We try to keep the stack sizes really small in order to conserve memory
> >> for constrained platforms. The controller stack is pretty small and it
> gets
> >> pretty close to the bottom. Furthermore, it is a bit of difficult task
> to
> >> test every combination of system configuration variable so I would not
> be
> >> terribly surprised if there is a combination that can exceed the stack.
> >>
> >> It would be great if you could post to the list your target and/or
> system
> >> configuration variables you might have changed along with what was
> >> happening at the time (if you know) so the controller stack size can be
> >> adjusted accordingly.
> >>
> >> You can easily tell if the stack overflowed; just look at the bottom of
> >> the stack and if 0xdeadbeef is not there it overflowed. The controller
> >> stack is called g_ble_ll_stack. If you are in gdb you can do this:
> x/32wx
> >> g_ble_ll_stack and it should show 0xdeadbeef for some amount of words.
> >>
> >> What platform are you running this on Pritish?
> >>
> >>
> >>> On Apr 19, 2017, at 2:14 AM, Andrey Serdtsev <
> >> [email protected]> wrote:
> >>>
> >>> Well, recently I've also get stack corruption: 80 dwords for BLE
> >> controller's LL task was too low value. Increasing it to 128 works for
> me.
> >>> I'm in doubt, in theory this should be the common case, but de-facto
> >> it's not. Possibly your exception is related to the case. Anyway, this
> >> requires more analysis.
> >>>
> >>> BR,
> >>> Andrey
> >>>
> >>> On 19.04.2017 01:20, Pritish Gandhi wrote:
> >>>> Hi All,
> >>>> I have leveraged the blecent demo application to build a BLE gateway
> >> type
> >>>> application. It works great most of the time but rarely I see a crash
> >> which
> >>>> I could really use some help debugging.
> >>>>
> >>>> Console logs:
> >>>> 18286:[ts=18286000ssb, mod=4 level=1] GATT procedure initiated: read;
> >>>> att_handle=43
> >>>> 18293:[ts=18293000ssb, mod=4 level=1] GATT procedure initiated: write;
> >>>> att_handle=44 len=2
> >>>> 18529:Unhandled interrupt (3), exception sp 0x10000760
> >>>> 18529: r0:0x100007a7  r1:0x20017d91  r2:0x20008534  r3:0x10010001
> >>>> 18529: r4:0x0000001c  r5:0xfffffffe  r6:0x00000001  r7:0x100007a7
> >>>> 18529: r8:0x00000000  r9:0x00000000 r10:0x10000000 r11:0x00000000
> >>>> 18529:r12:0x10000648  lr:0x08023753  pc:0x08025df6 psr:0x21000200
> >>>> 18529:ICSR:0x0440f803 HFSR:0x40000000 CFSR:0x00000400
> >>>> 18529:BFAR:0xe000ed38 MMFAR:0xe000ed34
> >>>>
> >>>> (gdb) list *0x08025df6
> >>>> 0x8025df6 is in memcpy (memcpy.c:23).
> >>>> 18 size_t nq = n >> 3;
> >>>> 19 asm volatile ("cld ; rep ; movsq ; movl %3,%%ecx ; rep ;
> movsb":"+c"
> >>>> 20      (nq), "+S"(p), "+D"(q)
> >>>> 21      :"r"((uint32_t) (n & 7)));
> >>>> 22 #else
> >>>> 23 while (n--) {
> >>>> 24 *q++ = *p++;
> >>>> 25 }
> >>>> 26 #endif
> >>>> 27
> >>>> (gdb) list *0x08023753
> >>>> 0x8023753 is in os_mbuf_copydata (os_mbuf.c:722).
> >>>> 717        m = SLIST_NEXT(m, om_next);
> >>>> 718    }
> >>>> 719    while (len > 0 && m != NULL) {
> >>>> 720        count = min(m->om_len - off, len);
> >>>> 721        memcpy(udst, m->om_data + off, count);
> >>>> 722        len -= count;
> >>>> 723        udst += count;
> >>>> 724        off = 0;
> >>>> 725        m = SLIST_NEXT(m, om_next);
> >>>> 726    }
> >>>>
> >>>> Dumping more from the stack from the crash log:
> >>>>
> >>>> (gdb) x/20wx 0x10000760
> >>>> 0x10000760 <ble_gateway_stack+1888>: 0x100007a7 0x20017d91 0x20008534
> >>>> 0x10010001
> >>>> 0x10000770 <ble_gateway_stack+1904>: 0x10000648 0x08023753 0x08025df6
> >>>> 0x21000200
> >>>> 0x10000780 <ble_gateway_stack+1920>: 0x08023738 0x20008514 0x00000002
> >>>> 0x20008514
> >>>> 0x10000790 <ble_gateway_stack+1936>: 0x00000001 0x00000000 0x00000000
> >>>> 0x0802c055
> >>>> 0x100007a0 <ble_gateway_stack+1952>: 0x00000000 0x0502bf6f 0x04000100
> >>>> 0x00501300
> >>>> (gdb)
> >>>> 0x100007b0 <ble_gateway_stack+1968>: 0x00220000 0xe3df95b1 0x8210d712
> >>>> 0x65664608
> >>>> 0x100007c0 <ble_gateway_stack+1984>: 0x1950c6c9 0x5fb80fba 0x01021fd0
> >>>> 0x10020305
> >>>> 0x100007d0 <ble_gateway_stack+2000>: 0x000000f1 0x00000000 0x00000000
> >>>> 0x00000000
> >>>> 0x100007e0 <ble_gateway_stack+2016>: 0x00000000 0x00000000 0x3e04bc00
> >>>> 0x0001022b
> >>>> 0x100007f0 <ble_gateway_stack+2032>: 0xb8158700 0x1ff4f5d8 0x03060102
> >>>> 0x17fe9f03
> >>>>
> >>>> It seems like the caller is:
> >>>> (gdb) list *0x0802c055
> >>>> 0x802c055 is in ble_hs_log_mbuf (ble_hs_log.c:31).
> >>>> 26 ble_hs_log_mbuf(const struct os_mbuf *om)
> >>>> 27 {
> >>>> 28    uint8_t u8;
> >>>> 29    int i;
> >>>> 30
> >>>> 31    for (i = 0; i < OS_MBUF_PKTLEN(om); i++) {
> >>>> 32        os_mbuf_copydata(om, i, 1, &u8);
> >>>> 33        BLE_HS_LOG(DEBUG, "0x%02x ", u8);
> >>>> 34    }
> >>>> 35 }
> >>>>
> >>>> But notice that I cannot trace back further to who called
> >> ble_hs_log_mbuf()
> >>>> because it seems like
> >>>> the stack has been trashed!!
> >>>>
> >>>> Any help is appreciated.
> >>>> Thanks,
> >>>> Pritish
> >>>>
> >>>
> >>
> >>
>
>

Re: Help with debugging a crash on a BLE gateway application

Reply via email to