Hi Marko, You're right, that looks eerily similar to what I'm seeing (MYNEWT-656 does). Yeah my next task it to upgrade to the latest OS version. Thanks again, Pritish
On Wed, Apr 19, 2017 at 10:21 AM, marko kiiskila <[email protected]> wrote: > > > On Apr 19, 2017, at 10:06 AM, Pritish Gandhi <[email protected]> > wrote: > > > > I'm sorry I forgot to give more details about my setup and architecture. > > > > I'm running this on an STM32F4Discovery EVB (so STM32F407VG). It can't be > > the issue with the controller stack, since I'm not using the controller. > In > > my setup the STM32F4Discovery is driving an externally connected Broadcom > > BT controller over UART HCI. So I'm only building the HCI component along > > with the UART transport. > > > > Looking at the STM32F407VG data sheet: > > ICSR:0x0440f803: Seems to signal that Systick interrupt is pending (not > > interesting) > > HFSR:0x40000000: Forced Hard Fault > > CFSR:0x00000400: Imprecise Error > > > > I have 2 application threads running. A BLE Gateway thread (Priority 1) > and > > a DEMO thread (Priority 2). I have checked both stacks and they seem to > > have plenty of room to grow (I see 0xdeadbeef as you suggested). > > > > I wonder whether either: > > a) I took an interrupt which trashed my BLE Gateway stack (Since the > > os_membuf_copydata()->memcpy() happened on that thread). > > > > b) Somehow the count/offset in os_membuf_copydata() is going negative > > causing me to trash my own stack. > > > > I've added some asserts there to make sure that isn't happening. > > > > I have a couple other questions. > > 1) How to get information through GDB on what are all the threads running > > on the system? > > I tried (gdb) info threads : But that only showed me a single thread > (maybe > > the one that was currently running) > > This would need modifying openocd to support mynewt as OS. Which we > have not tried doing yet. > However, take a look at gdb scripts under compiler/gdbmacros. > Specifically file os.gdb. ‘os_tasks’ will list the tasks. > > > 2) It doesn't seem like the MPU is turned on for this platform. Is that > > correct? > > Correct. MPUs have not been tackled yet. > > > Another note to add (kinda important) is that I'm running myNewt version > > 0.9.9. I haven't upgraded (yet!). > > There’s been some bug fixes since then which might be of interest. > https://issues.apache.org/jira/browse/MYNEWT-656 < > https://issues.apache.org/jira/browse/MYNEWT-656> specifically looks > like it might be a match. > > Later, > M > > > > Thanks, > > Pritish > > > > > > > > On Wed, Apr 19, 2017 at 7:59 AM, will sanfilippo <[email protected]> > wrote: > > > >> We try to keep the stack sizes really small in order to conserve memory > >> for constrained platforms. The controller stack is pretty small and it > gets > >> pretty close to the bottom. Furthermore, it is a bit of difficult task > to > >> test every combination of system configuration variable so I would not > be > >> terribly surprised if there is a combination that can exceed the stack. > >> > >> It would be great if you could post to the list your target and/or > system > >> configuration variables you might have changed along with what was > >> happening at the time (if you know) so the controller stack size can be > >> adjusted accordingly. > >> > >> You can easily tell if the stack overflowed; just look at the bottom of > >> the stack and if 0xdeadbeef is not there it overflowed. The controller > >> stack is called g_ble_ll_stack. If you are in gdb you can do this: > x/32wx > >> g_ble_ll_stack and it should show 0xdeadbeef for some amount of words. > >> > >> What platform are you running this on Pritish? > >> > >> > >>> On Apr 19, 2017, at 2:14 AM, Andrey Serdtsev < > >> [email protected]> wrote: > >>> > >>> Well, recently I've also get stack corruption: 80 dwords for BLE > >> controller's LL task was too low value. Increasing it to 128 works for > me. > >>> I'm in doubt, in theory this should be the common case, but de-facto > >> it's not. Possibly your exception is related to the case. Anyway, this > >> requires more analysis. > >>> > >>> BR, > >>> Andrey > >>> > >>> On 19.04.2017 01:20, Pritish Gandhi wrote: > >>>> Hi All, > >>>> I have leveraged the blecent demo application to build a BLE gateway > >> type > >>>> application. It works great most of the time but rarely I see a crash > >> which > >>>> I could really use some help debugging. > >>>> > >>>> Console logs: > >>>> 18286:[ts=18286000ssb, mod=4 level=1] GATT procedure initiated: read; > >>>> att_handle=43 > >>>> 18293:[ts=18293000ssb, mod=4 level=1] GATT procedure initiated: write; > >>>> att_handle=44 len=2 > >>>> 18529:Unhandled interrupt (3), exception sp 0x10000760 > >>>> 18529: r0:0x100007a7 r1:0x20017d91 r2:0x20008534 r3:0x10010001 > >>>> 18529: r4:0x0000001c r5:0xfffffffe r6:0x00000001 r7:0x100007a7 > >>>> 18529: r8:0x00000000 r9:0x00000000 r10:0x10000000 r11:0x00000000 > >>>> 18529:r12:0x10000648 lr:0x08023753 pc:0x08025df6 psr:0x21000200 > >>>> 18529:ICSR:0x0440f803 HFSR:0x40000000 CFSR:0x00000400 > >>>> 18529:BFAR:0xe000ed38 MMFAR:0xe000ed34 > >>>> > >>>> (gdb) list *0x08025df6 > >>>> 0x8025df6 is in memcpy (memcpy.c:23). > >>>> 18 size_t nq = n >> 3; > >>>> 19 asm volatile ("cld ; rep ; movsq ; movl %3,%%ecx ; rep ; > movsb":"+c" > >>>> 20 (nq), "+S"(p), "+D"(q) > >>>> 21 :"r"((uint32_t) (n & 7))); > >>>> 22 #else > >>>> 23 while (n--) { > >>>> 24 *q++ = *p++; > >>>> 25 } > >>>> 26 #endif > >>>> 27 > >>>> (gdb) list *0x08023753 > >>>> 0x8023753 is in os_mbuf_copydata (os_mbuf.c:722). > >>>> 717 m = SLIST_NEXT(m, om_next); > >>>> 718 } > >>>> 719 while (len > 0 && m != NULL) { > >>>> 720 count = min(m->om_len - off, len); > >>>> 721 memcpy(udst, m->om_data + off, count); > >>>> 722 len -= count; > >>>> 723 udst += count; > >>>> 724 off = 0; > >>>> 725 m = SLIST_NEXT(m, om_next); > >>>> 726 } > >>>> > >>>> Dumping more from the stack from the crash log: > >>>> > >>>> (gdb) x/20wx 0x10000760 > >>>> 0x10000760 <ble_gateway_stack+1888>: 0x100007a7 0x20017d91 0x20008534 > >>>> 0x10010001 > >>>> 0x10000770 <ble_gateway_stack+1904>: 0x10000648 0x08023753 0x08025df6 > >>>> 0x21000200 > >>>> 0x10000780 <ble_gateway_stack+1920>: 0x08023738 0x20008514 0x00000002 > >>>> 0x20008514 > >>>> 0x10000790 <ble_gateway_stack+1936>: 0x00000001 0x00000000 0x00000000 > >>>> 0x0802c055 > >>>> 0x100007a0 <ble_gateway_stack+1952>: 0x00000000 0x0502bf6f 0x04000100 > >>>> 0x00501300 > >>>> (gdb) > >>>> 0x100007b0 <ble_gateway_stack+1968>: 0x00220000 0xe3df95b1 0x8210d712 > >>>> 0x65664608 > >>>> 0x100007c0 <ble_gateway_stack+1984>: 0x1950c6c9 0x5fb80fba 0x01021fd0 > >>>> 0x10020305 > >>>> 0x100007d0 <ble_gateway_stack+2000>: 0x000000f1 0x00000000 0x00000000 > >>>> 0x00000000 > >>>> 0x100007e0 <ble_gateway_stack+2016>: 0x00000000 0x00000000 0x3e04bc00 > >>>> 0x0001022b > >>>> 0x100007f0 <ble_gateway_stack+2032>: 0xb8158700 0x1ff4f5d8 0x03060102 > >>>> 0x17fe9f03 > >>>> > >>>> It seems like the caller is: > >>>> (gdb) list *0x0802c055 > >>>> 0x802c055 is in ble_hs_log_mbuf (ble_hs_log.c:31). > >>>> 26 ble_hs_log_mbuf(const struct os_mbuf *om) > >>>> 27 { > >>>> 28 uint8_t u8; > >>>> 29 int i; > >>>> 30 > >>>> 31 for (i = 0; i < OS_MBUF_PKTLEN(om); i++) { > >>>> 32 os_mbuf_copydata(om, i, 1, &u8); > >>>> 33 BLE_HS_LOG(DEBUG, "0x%02x ", u8); > >>>> 34 } > >>>> 35 } > >>>> > >>>> But notice that I cannot trace back further to who called > >> ble_hs_log_mbuf() > >>>> because it seems like > >>>> the stack has been trashed!! > >>>> > >>>> Any help is appreciated. > >>>> Thanks, > >>>> Pritish > >>>> > >>> > >> > >> > >
