Am really bad at GDB. Also its like a rabbit hole :) I ported over my application with the git version of Mynewt-core, and enabled OS_CRASH_STACKTRACE.
With it enabled, the following is the dump. #mesh-onoff STATUS: Sent ! Action Received over MESH Length :- 14 000486 Unhandled interrupt (3), exception sp 0x2000aba0 000486 r0:0xcf0f98cb r1:0x5c5a76b3 r2:0x681af5c8 r3:0xb1334673 000486 r4:0x2000ac68 r5:0x00000007 r6:0x00000000 r7:0x200008a9 000486 r8:0x2000acf0 r9:0x00012101 r10:0xd7229882 r11:0xd929b3bb 000486 r12:0x7e3cdeb8 lr:0x2266a80b pc:0x59d8de5b psr:0xe8eb9828 000486 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x00040000 000486 BFAR:0xe000ed38 MMFAR:0xe000ed34 000486 task:DECODE_TASK 000486 0x2000abec: 0x0003b4d8 000486 0x2000abf4: 0x000246a7 000486 0x2000ac04: 0x0003b4d8 000486 0x2000ac0c: 0x0002488d 000486 0x2000ac4c: 0x00012101 000486 0x2000ad0c: 0x0000c1e7 000486 0x2000ad1c: 0x0000c1e7 000486 0x2000ad2c: 0x0000c211 000486 0x2000ad30: 0x0003ad44 000486 0x2000ad3c: 0x00013023 000486 0x2000ad58: 0x000238e1 000486 0x2000ad60: 0x00037f81 000486 0x2000ad6c: 0x00023a79 000486 0x2000ad70: 0x00039b80 000486 0x2000ad74: 0x00039b7f 000486 0x2000ad84: 0x00023587 000486 0x2000ada8: 0x000087cd 000486 0x2000adc4: 0x0000d51d 000486 0x2000adc8: 0x0000d51c 000486 0x2000add8: 0x000398cd 000486 0x2000ade4: 0x000087e9 000486 0x2000ae08: 0x00010001 000486 0x2000ae0c: 0x0001c239 000486 0x2000ae10: 0x0003b35c 000486 0x2000ae1c: 0x00020001 000486 0x2000ae20: 0x0001c38d 000486 0x2000ae30: 0x00030001 000486 0x2000ae34: 0x0001c509 000486 0x2000ae48: 0x0001c38d 000486 0x2000ae5c: 0x0001c509 000486 0x2000ae70: 0x0001c239 000486 0x2000ae74: 0x0003b37c 000486 0x2000ae84: 0x0001c38d 000486 0x2000ae98: 0x0001c509 000486 0x2000aeac: 0x0001c54d 000486 0x2000aec0: 0x0001c239 000486 0x2000aec4: 0x0003ba28 000486 0x2000aed4: 0x0001c38d 000486 0x2000aee8: 0x0001c509 000486 0x2000aefc: 0x0001c38d 000486 0x2000af10: 0x0001c509 000486 0x2000af24: 0x0001c54d 000486 0x2000af38: 0x0001c38d 000486 0x2000af4c: 0x0001c509 000486 0x2000af60: 0x0001c38d 000486 0x2000af74: 0x0001c509 000486 0x2000af88: 0x0001c54d 000486 0x2000af9c: 0x0001c38d 000486 0x2000afb0: 0x0001c509 > On 31-Aug-2018, at 5:21 PM, marko kiiskila <[email protected]> wrote: > > Some suggestions (inline). > >> On Aug 31, 2018, at 2:32 PM, Aditya Xavier <[email protected]> >> wrote: >> >> Gosh, this doesn’t make much sense to me :( >> >> (gdb) monitor go >> (gdb) monitor reset >> Resetting target >> (gdb) c >> Continuing. >> >> Program received signal SIGTRAP, Trace/breakpoint trap. >> hal_system_reset () at >> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50 >> 50 asm("bkpt"); >> (gdb) bt >> #0 hal_system_reset () at >> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50 >> #1 0x0000bf2e in os_default_irq (tf=0x2000ffc8) at >> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170 >> #2 0x0000da56 in os_default_irq_asm () at >> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260 >> #3 <signal handler called> >> #4 0x00000000 in ?? () >> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >> (gdb) frame 1 >> #1 0x0000bf2e in os_default_irq (tf=0x2000ffc8) at >> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170 >> 170 hal_system_reset(); >> (gdb) p/x *tf >> $1 = {ef = 0x2000abd0, r4 = 0x1b000000, r5 = 0x2000acc0, r6 = 0x2000aca0, r7 >> = 0x7, r8 = 0x0, r9 = 0x200008a9, r10 = 0x2000ad28, r11 = 0x11d91, lr = >> 0xfffffffd} >> (gdb) p/x *tf->ef >> $2 = {r0 = 0xd7229882, r1 = 0xd929b3bb, r2 = 0xcf0f98cb, r3 = 0x5c5a76b3, >> r12 = 0x681af5c8, lr = 0xb1334673, pc = 0x7e3cdeb8, psr = 0x2266a80b} >> (gdb) x/32x 0xd7229882 >> 0xd7229882: 0x00000000 0x00000000 0x00000000 0x00000000 >> 0xd7229892: 0x00000000 0x00000000 0x00000000 0x00000000 >> 0xd72298a2: 0x00000000 0x00000000 0x00000000 0x00000000 >> 0xd72298b2: 0x00000000 0x00000000 0x00000000 0x00000000 >> 0xd72298c2: 0x00000000 0x00000000 0x00000000 0x00000000 >> 0xd72298d2: 0x00000000 0x00000000 0x00000000 0x00000000 >> 0xd72298e2: 0x00000000 0x00000000 0x00000000 0x00000000 >> 0xd72298f2: 0x00000000 0x00000000 0x00000000 0x00000000 >> (gdb) x/32x 0x2000abd0 >> 0x2000abd0: 0xd7229882 0xd929b3bb 0xcf0f98cb 0x5c5a76b3 >> 0x2000abe0: 0x681af5c8 0xb1334673 0x7e3cdeb8 0x2266a80b >> 0x2000abf0: 0x59d8de5b 0xe8eb9828 0x96d74690 0xb4b1ee9b >> 0x2000ac00: 0x95f0cad6 0x7d1b52fe 0xebcc146e 0x5f7dfaf5 >> 0x2000ac10: 0x62dd2c19 0x1fc67ee7 0xf40a6a89 0xab77907c > > ^^^^^ looks bad, especially the top area. Should have dump of registers > stored at the time the crash. > > >> 0x2000ac20: 0x00000010 0x00039c74 0x2000ad28 0x0002329f >> 0x2000ac30: 0xd87c5730 0xa203a288 0x00000010 0x00039c74 >> 0x2000ac40: 0x2000ad28 0x00023485 0x00000000 0x00000000 >> (gdb) p &__text >> No symbol "__text" in current context. >> (gdb) p &__etext >> $3 = (<data variable, no debug info> *) 0x3a9c8 >> (gdb) p &__text >> No symbol "__text" in current context. > > This was probably added at the same time as OS_STACK_BACKTRACE. > You’re looking for values between start of your image slot and 0x3a9c8. > >> (gdb) x/i 0xd7229882 >> 0xd7229882: movs r0, r0 >> (gdb) list *0xd7229882 >> (gdb) x/i 0x681af5c8 >> 0x681af5c8: movs r0, r0 >> (gdb) x/i 0x59d8de5b >> 0x59d8de5b: movs r0, r0 >> (gdb) x/i 0x62dd2c19 >> 0x62dd2c19: movs r0, r0 >> (gdb) x/i 0x2000ad28 >> 0x2000ad28: lsls r0, r2, #6 >> (gdb) x/i 0x1fc67ee7 >> 0x1fc67ee7: movs r0, r0 >> (gdb) x/i 0xa203a288 >> 0xa203a288: movs r0, r0 >> (gdb) x/i 0xe8eb9828 >> 0xe8eb9828: movs r0, r0 >> (gdb) x/i 0xcf0f98cb >> 0xcf0f98cb: movs r0, r0 >> (gdb) x/i 0x96d74690 >> 0x96d74690: movs r0, r0 >> (gdb) x/i 0xf40a6a89 >> 0xf40a6a89: movs r0, r0 >> (gdb) x/i 0x2000ad28 >> 0x2000ad28: lsls r0, r2, #6 >> (gdb) x/i 0x00000010 >> 0x10: movs r0, r0 >> (gdb) x/i 0x0002329f >> 0x2329f <shift_rows+108>: add sp, #20 >> (gdb) x/i 0x00039c74 >> 0x39c74 <sbox>: ldrb r3, [r4, #17] >> (gdb) x/i 0xa203a288 >> 0xa203a288: movs r0, r0 >> (gdb) x/i 0x0002329f >> 0x2329f <shift_rows+108>: add sp, #20 >> (gdb) list *0x0002329f >> 0x2329f is in shift_rows >> (repos/apache-mynewt-core/crypto/tinycrypt/src/aes_encrypt.c:156). >> 151 t[0] = s[0]; t[1] = s[5]; t[2] = s[10]; t[3] = s[15]; >> 152 t[4] = s[4]; t[5] = s[9]; t[6] = s[14]; t[7] = s[3]; >> 153 t[8] = s[8]; t[9] = s[13]; t[10] = s[2]; t[11] = s[7]; >> 154 t[12] = s[12]; t[13] = s[1]; t[14] = s[6]; t[15] = s[11]; >> 155 (void) _copy(s, sizeof(t), t, sizeof(t)); >> 156 } >> 157 >> 158 int tc_aes_encrypt(uint8_t *out, const uint8_t *in, const >> TCAesKeySched_t s) >> 159 { >> 160 uint8_t state[Nk*Nb]; > > That could be writing that random looking data in the stack. encrypted data > should > look like gibberish. > Follow the stack a bit further starting continuing from 0x2000ac50. See if you > find who called it. I’m hazarding a guess that one of those args passed to > aes_encrypt() > is pointing to stack, and there’s not enough memory allocated to hold that > data. > > >>> On 31-Aug-2018, at 4:46 PM, marko kiiskila <[email protected]> wrote: >>> >>> Sure. Something like this: >>> >>> 000933 compat> crash div0 >>> crash div0 >>> 003157 Unhandled interrupt (3), exception sp 0x20001dd8 >>> 003157 r0:0x00000000 r1:0x00017161 r2:0x00000000 r3:0x0000002a >>> 003157 r4:0x200041d6 r5:0x00000000 r6:0x20000318 r7:0x00000000 >>> 003157 r8:0x00000000 r9:0x00000000 r10:0x00000000 r11:0x00000000 >>> 003157 r12:0x00000000 lr:0x00014949 pc:0x00014978 psr:0x61000000 >>> 003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000 >>> 003157 BFAR:0xe000ed38 MMFAR:0xe000ed34 >>> >>> Then from gdb: >>> >>> Program received signal SIGTRAP, Trace/breakpoint trap. >>> hal_system_reset () >>> at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50 >>> 50 asm("bkpt"); >>> (gdb) bt >>> #0 hal_system_reset () >>> at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50 >>> #1 0x00008be8 in os_default_irq (tf=0x2000ffc0) >>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171 >>> #2 0x0000a5b6 in os_default_irq_asm () >>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260 >>> #3 <signal handler called> >>> #4 0x00000000 in ?? () >>> #5 0x0000812c in Reset_Handler () >>> at >>> repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180 >>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>> (gdb) frame 1 >>> #1 0x00008be8 in os_default_irq (tf=0x2000ffc0) >>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171 >>> 171 hal_system_reset(); >>> (gdb) p/x *tf >>> $1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = >>> 0x0, >>> r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd} >>> (gdb) p/x *tf->ef >>> $2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949, >>> pc = 0x14978, psr = 0x61000000} >>> (gdb) x/32x 0x20001dd8 >>> 0x20001dd8 <os_main_stack+3896>: 0x00000000 0x00017161 >>> 0x00000000 0x0000002a >>> 0x20001de8 <os_main_stack+3912>: 0x00000000 0x00014949 >>> 0x00014978 0x61000000 >>> 0x20001df8 <os_main_stack+3928>: 0x00000003 0x00000000 >>> 0x00000000 0x0000002a >>> 0x20001e08 <os_main_stack+3944>: 0x00000001 0x00000002 >>> 0x0000000a 0x00014a21 >>> 0x20001e18 <os_main_stack+3960>: 0x00014a15 0x0000ebd9 >>> 0x00000000 0x200041d0 >>> 0x20001e28 <os_main_stack+3976>: 0x200041d6 0x00000000 >>> 0x0000000a 0x0001574d >>> 0x20001e38 <os_main_stack+3992>: 0x00015741 0x0000c925 >>> 0x200041d0 0x00000011 >>> 0x20001e48 <os_main_stack+4008>: 0x00000073 0x200041d3 >>> 0x00000000 0x0000ede9 >>> (gdb) p &__text >>> $3 = (<data variable, no debug info> *) 0x8020 <__isr_vector> >>> (gdb) p &__etext >>> $4 = (<data variable, no debug info> *) 0x175f0 >>> (gdb) x/i 0x00017161 >>> 0x17161: movs r0, r0 >>> (gdb) x/i 0x00014949 >>> 0x14949 <crash_device+12>: cbz r0, 0x1496a <crash_device+46> >>> (gdb) x/i 0x00014978 >>> 0x14978 <crash_device+60>: sdiv r3, r3, r2 >>> (gdb) x/i 0x00014a21 >>> 0x14a21 <crash_cli_cmd+12>: cbz r0, 0x14a28 <crash_cli_cmd+20> >>> (gdb) x/i 0x00014a15 >>> 0x14a15 <crash_cli_cmd>: push {r3, lr} >>> (gdb) list *0x14949 >>> 0x14949 is in crash_device >>> (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42). >>> warning: Source file is more recent than executable. >>> 37 int >>> 38 crash_device(char *how) >>> 39 { >>> 40 volatile int val1, val2, val3; >>> 41 >>> 42 if (!strcmp(how, "div0")) { >>> 43 >>> 44 val1 = 42; >>> 45 val2 = 0; >>> 46 >>> (gdb) list *0x00014a21 >>> 0x14a21 is in crash_cli_cmd >>> (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41). >>> 36 }; >>> 37 >>> 38 static int >>> 39 crash_cli_cmd(int argc, char **argv) >>> 40 { >>> 41 if (argc >= 2 && crash_device(argv[1]) == 0) { >>> 42 return 0; >>> 43 } >>> 44 console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n"); >>> 45 return 0; >>> (gdb) list *0x14a21 >>> 0x14a21 is in crash_cli_cmd >>> (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41). >>> 36 }; >>> 37 >>> 38 static int >>> 39 crash_cli_cmd(int argc, char **argv) >>> 40 { >>> 41 if (argc >= 2 && crash_device(argv[1]) == 0) { >>> 42 return 0; >>> 43 } >>> 44 console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n"); >>> 45 return 0; >>> >>> good luck. >>> >>>> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <[email protected]> >>>> wrote: >>>> >>>> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in >>>> the release. >>>> >>>> If I change the release, I believe there would be many API changes to be >>>> done on MESH side. >>>> >>>> Can you guide me on how to "manually walk the stack for looking for things >>>> which look like pointers to text” ? >>>> >>>> My gdb skill are pretty weak. >>>> >>>> I tried gdb where, with the following outcome. >>>> >>>> (gdb) c >>>> Continuing. >>>> >>>> >>>> Program received signal SIGTRAP, Trace/breakpoint trap. >>>> hal_system_reset () at >>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50 >>>> 50 asm("bkpt"); >>>> (gdb) >>>> Continuing. >>>> >>>> Program received signal SIGTRAP, Trace/breakpoint trap. >>>> hal_system_reset () at >>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50 >>>> 50 asm("bkpt"); >>>> (gdb) where >>>> #0 hal_system_reset () at >>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50 >>>> #1 0x0000bf2e in os_default_irq (tf=0x2000ffc8) at >>>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170 >>>> #2 0x0000da56 in os_default_irq_asm () at >>>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260 >>>> #3 <signal handler called> >>>> #4 0x00000000 in ?? () >>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>> >>>> >>>> >>>>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <[email protected]> wrote: >>>>> >>>>> >>>>> >>>>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hi ! >>>>>> >>>>>> Am having an issue with Sending and Receiving a Mesh Message. Though am >>>>>> positive the problem is more towards releasing the semaphore. >>>>>> >>>>>> Action Received over MESH Length :- 15 >>>>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0 >>>>>> 012273 r0:0xd7229882 r1:0xd929b3bb r2:0xcf0f98cb r3:0x5c5a76b3 >>>>>> 012273 r4:0x1b000000 r5:0x2000acc0 r6:0x2000aca0 r7:0x00000008 >>>>>> 012273 r8:0x00000000 r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91 >>>>>> 012273 r12:0x681af5c8 lr:0xb1334673 pc:0x7e3cdeb8 psr:0x2266a80b >>>>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000 >>>>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34 >>>>>> >>>>>> Am sending a group mesh message for testing. The sequence of events are >>>>>> as follows. >>>>>> >>>>>> Button TASK -> send message over MESH -> Mesh receives message on model >>>>>> -> copies the data and starts releases the Semaphore for another task -> >>>>>> LOG Task starts and logs the message. >>>>>> >>>>>> In this entire flow, the moment I receive the message and release the >>>>>> semaphore the firmware crashes. >>>>>> >>>>>> I tried increasing the STACK size of the LOG task, however that didn’t >>>>>> help. >>>>>> >>>>>> Could someone let me know how to understand where / why the crash is >>>>>> happening ? >>>>> >>>>> Looking at your registers they seem to be garbage, so I’m guessing stack >>>>> corruption of some sort; does not have to be overflow. >>>>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for >>>>> looking for things which >>>>> look like pointers to text.
