Re: Mynewt crash when releasing semaphore

Aditya Xavier Fri, 31 Aug 2018 05:31:47 -0700

Am really bad at GDB. Also its like a rabbit hole :)

I ported over my application with the git version of Mynewt-core, and enabled 
OS_CRASH_STACKTRACE.


With it enabled, the following is the dump.

#mesh-onoff STATUS: Sent !
Action Received over MESH Length :- 14
000486 Unhandled interrupt (3), exception sp 0x2000aba0
000486  r0:0xcf0f98cb  r1:0x5c5a76b3  r2:0x681af5c8  r3:0xb1334673
000486  r4:0x2000ac68  r5:0x00000007  r6:0x00000000  r7:0x200008a9
000486  r8:0x2000acf0  r9:0x00012101 r10:0xd7229882 r11:0xd929b3bb
000486 r12:0x7e3cdeb8  lr:0x2266a80b  pc:0x59d8de5b psr:0xe8eb9828
000486 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x00040000
000486 BFAR:0xe000ed38 MMFAR:0xe000ed34
000486 task:DECODE_TASK
000486  0x2000abec: 0x0003b4d8
000486  0x2000abf4: 0x000246a7
000486  0x2000ac04: 0x0003b4d8
000486  0x2000ac0c: 0x0002488d
000486  0x2000ac4c: 0x00012101
000486  0x2000ad0c: 0x0000c1e7
000486  0x2000ad1c: 0x0000c1e7
000486  0x2000ad2c: 0x0000c211
000486  0x2000ad30: 0x0003ad44
000486  0x2000ad3c: 0x00013023
000486  0x2000ad58: 0x000238e1
000486  0x2000ad60: 0x00037f81
000486  0x2000ad6c: 0x00023a79
000486  0x2000ad70: 0x00039b80
000486  0x2000ad74: 0x00039b7f
000486  0x2000ad84: 0x00023587
000486  0x2000ada8: 0x000087cd
000486  0x2000adc4: 0x0000d51d
000486  0x2000adc8: 0x0000d51c
000486  0x2000add8: 0x000398cd
000486  0x2000ade4: 0x000087e9
000486  0x2000ae08: 0x00010001
000486  0x2000ae0c: 0x0001c239
000486  0x2000ae10: 0x0003b35c
000486  0x2000ae1c: 0x00020001
000486  0x2000ae20: 0x0001c38d
000486  0x2000ae30: 0x00030001
000486  0x2000ae34: 0x0001c509
000486  0x2000ae48: 0x0001c38d
000486  0x2000ae5c: 0x0001c509
000486  0x2000ae70: 0x0001c239
000486  0x2000ae74: 0x0003b37c
000486  0x2000ae84: 0x0001c38d
000486  0x2000ae98: 0x0001c509
000486  0x2000aeac: 0x0001c54d
000486  0x2000aec0: 0x0001c239
000486  0x2000aec4: 0x0003ba28
000486  0x2000aed4: 0x0001c38d
000486  0x2000aee8: 0x0001c509
000486  0x2000aefc: 0x0001c38d
000486  0x2000af10: 0x0001c509
000486  0x2000af24: 0x0001c54d
000486  0x2000af38: 0x0001c38d
000486  0x2000af4c: 0x0001c509
000486  0x2000af60: 0x0001c38d
000486  0x2000af74: 0x0001c509
000486  0x2000af88: 0x0001c54d
000486  0x2000af9c: 0x0001c38d
000486  0x2000afb0: 0x0001c509


> On 31-Aug-2018, at 5:21 PM, marko kiiskila <[email protected]> wrote:
> 
> Some suggestions (inline).
> 
>> On Aug 31, 2018, at 2:32 PM, Aditya Xavier <[email protected]> 
>> wrote:
>> 
>> Gosh, this doesn’t make much sense to me :(
>> 
>> (gdb) monitor go
>> (gdb) monitor reset
>> Resetting target
>> (gdb) c
>> Continuing.
>> 
>> Program received signal SIGTRAP, Trace/breakpoint trap.
>> hal_system_reset () at 
>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> 50               asm("bkpt");
>> (gdb) bt
>> #0  hal_system_reset () at 
>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at 
>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>> #2  0x0000da56 in os_default_irq_asm () at 
>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>> #3  <signal handler called>
>> #4  0x00000000 in ?? ()
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> (gdb) frame 1
>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at 
>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>> 170      hal_system_reset();
>> (gdb) p/x *tf
>> $1 = {ef = 0x2000abd0, r4 = 0x1b000000, r5 = 0x2000acc0, r6 = 0x2000aca0, r7 
>> = 0x7, r8 = 0x0, r9 = 0x200008a9, r10 = 0x2000ad28, r11 = 0x11d91, lr = 
>> 0xfffffffd}
>> (gdb) p/x *tf->ef
>> $2 = {r0 = 0xd7229882, r1 = 0xd929b3bb, r2 = 0xcf0f98cb, r3 = 0x5c5a76b3, 
>> r12 = 0x681af5c8, lr = 0xb1334673, pc = 0x7e3cdeb8, psr = 0x2266a80b}
>> (gdb) x/32x 0xd7229882
>> 0xd7229882:  0x00000000      0x00000000      0x00000000      0x00000000
>> 0xd7229892:  0x00000000      0x00000000      0x00000000      0x00000000
>> 0xd72298a2:  0x00000000      0x00000000      0x00000000      0x00000000
>> 0xd72298b2:  0x00000000      0x00000000      0x00000000      0x00000000
>> 0xd72298c2:  0x00000000      0x00000000      0x00000000      0x00000000
>> 0xd72298d2:  0x00000000      0x00000000      0x00000000      0x00000000
>> 0xd72298e2:  0x00000000      0x00000000      0x00000000      0x00000000
>> 0xd72298f2:  0x00000000      0x00000000      0x00000000      0x00000000
>> (gdb) x/32x 0x2000abd0
>> 0x2000abd0:  0xd7229882      0xd929b3bb      0xcf0f98cb      0x5c5a76b3
>> 0x2000abe0:  0x681af5c8      0xb1334673      0x7e3cdeb8      0x2266a80b
>> 0x2000abf0:  0x59d8de5b      0xe8eb9828      0x96d74690      0xb4b1ee9b
>> 0x2000ac00:  0x95f0cad6      0x7d1b52fe      0xebcc146e      0x5f7dfaf5
>> 0x2000ac10:  0x62dd2c19      0x1fc67ee7      0xf40a6a89      0xab77907c
> 
> ^^^^^ looks bad, especially the top area. Should have dump of registers
> stored at the time the crash.
> 
> 
>> 0x2000ac20:  0x00000010      0x00039c74      0x2000ad28      0x0002329f
>> 0x2000ac30:  0xd87c5730      0xa203a288      0x00000010      0x00039c74
>> 0x2000ac40:  0x2000ad28      0x00023485      0x00000000      0x00000000
>> (gdb) p &__text
>> No symbol "__text" in current context.
>> (gdb)  p &__etext
>> $3 = (<data variable, no debug info> *) 0x3a9c8
>> (gdb) p &__text
>> No symbol "__text" in current context.
> 
> This was probably added at the same time as OS_STACK_BACKTRACE.
> You’re looking for values between start of your image slot and 0x3a9c8.
> 
>> (gdb) x/i 0xd7229882
>>  0xd7229882: movs    r0, r0
>> (gdb) list *0xd7229882
>> (gdb) x/i 0x681af5c8
>>  0x681af5c8: movs    r0, r0
>> (gdb) x/i 0x59d8de5b
>>  0x59d8de5b: movs    r0, r0
>> (gdb) x/i 0x62dd2c19
>>  0x62dd2c19: movs    r0, r0
>> (gdb) x/i 0x2000ad28
>>  0x2000ad28: lsls    r0, r2, #6
>> (gdb) x/i 0x1fc67ee7
>>  0x1fc67ee7: movs    r0, r0
>> (gdb) x/i 0xa203a288
>>  0xa203a288: movs    r0, r0
>> (gdb) x/i 0xe8eb9828
>>  0xe8eb9828: movs    r0, r0
>> (gdb) x/i 0xcf0f98cb
>>  0xcf0f98cb: movs    r0, r0
>> (gdb) x/i 0x96d74690
>>  0x96d74690: movs    r0, r0
>> (gdb) x/i 0xf40a6a89
>>  0xf40a6a89: movs    r0, r0
>> (gdb) x/i 0x2000ad28
>>  0x2000ad28: lsls    r0, r2, #6
>> (gdb) x/i 0x00000010
>>  0x10:       movs    r0, r0
>> (gdb) x/i 0x0002329f
>>  0x2329f <shift_rows+108>:   add     sp, #20
>> (gdb) x/i 0x00039c74
>>  0x39c74 <sbox>:     ldrb    r3, [r4, #17]
>> (gdb) x/i 0xa203a288
>>  0xa203a288: movs    r0, r0
>> (gdb) x/i 0x0002329f
>>  0x2329f <shift_rows+108>:   add     sp, #20
>> (gdb) list *0x0002329f
>> 0x2329f is in shift_rows 
>> (repos/apache-mynewt-core/crypto/tinycrypt/src/aes_encrypt.c:156).
>> 151          t[0]  = s[0]; t[1] = s[5]; t[2] = s[10]; t[3] = s[15];
>> 152          t[4]  = s[4]; t[5] = s[9]; t[6] = s[14]; t[7] = s[3];
>> 153          t[8]  = s[8]; t[9] = s[13]; t[10] = s[2]; t[11] = s[7];
>> 154          t[12] = s[12]; t[13] = s[1]; t[14] = s[6]; t[15] = s[11];
>> 155          (void) _copy(s, sizeof(t), t, sizeof(t));
>> 156  }
>> 157  
>> 158  int tc_aes_encrypt(uint8_t *out, const uint8_t *in, const 
>> TCAesKeySched_t s)
>> 159  {
>> 160          uint8_t state[Nk*Nb];
> 
> That could be writing that random looking data in the stack. encrypted data 
> should
> look like gibberish.
> Follow the stack a bit further starting continuing from 0x2000ac50. See if you
> find who called it. I’m hazarding a guess that one of those args passed to 
> aes_encrypt()
> is pointing to stack, and there’s not enough memory allocated to hold that 
> data.
> 
> 
>>> On 31-Aug-2018, at 4:46 PM, marko kiiskila <[email protected]> wrote:
>>> 
>>> Sure. Something like this:
>>> 
>>> 000933 compat> crash div0
>>> crash div0
>>> 003157 Unhandled interrupt (3), exception sp 0x20001dd8
>>> 003157  r0:0x00000000  r1:0x00017161  r2:0x00000000  r3:0x0000002a
>>> 003157  r4:0x200041d6  r5:0x00000000  r6:0x20000318  r7:0x00000000
>>> 003157  r8:0x00000000  r9:0x00000000 r10:0x00000000 r11:0x00000000
>>> 003157 r12:0x00000000  lr:0x00014949  pc:0x00014978 psr:0x61000000
>>> 003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000
>>> 003157 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>> 
>>> Then from gdb:
>>> 
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset ()
>>>  at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50              asm("bkpt");
>>> (gdb) bt
>>> #0  hal_system_reset ()
>>>  at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>>  at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>>> #2  0x0000a5b6 in os_default_irq_asm ()
>>>  at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>> #3  <signal handler called>
>>> #4  0x00000000 in ?? ()
>>> #5  0x0000812c in Reset_Handler ()
>>>  at 
>>> repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> (gdb) frame 1
>>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>>  at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>>> 171     hal_system_reset();
>>> (gdb) p/x *tf
>>> $1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = 
>>> 0x0, 
>>> r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd}
>>> (gdb) p/x *tf->ef
>>> $2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949, 
>>> pc = 0x14978, psr = 0x61000000}
>>> (gdb) x/32x 0x20001dd8
>>> 0x20001dd8 <os_main_stack+3896>:    0x00000000      0x00017161      
>>> 0x00000000      0x0000002a
>>> 0x20001de8 <os_main_stack+3912>:    0x00000000      0x00014949      
>>> 0x00014978      0x61000000
>>> 0x20001df8 <os_main_stack+3928>:    0x00000003      0x00000000      
>>> 0x00000000      0x0000002a
>>> 0x20001e08 <os_main_stack+3944>:    0x00000001      0x00000002      
>>> 0x0000000a      0x00014a21
>>> 0x20001e18 <os_main_stack+3960>:    0x00014a15      0x0000ebd9      
>>> 0x00000000      0x200041d0
>>> 0x20001e28 <os_main_stack+3976>:    0x200041d6      0x00000000      
>>> 0x0000000a      0x0001574d
>>> 0x20001e38 <os_main_stack+3992>:    0x00015741      0x0000c925      
>>> 0x200041d0      0x00000011
>>> 0x20001e48 <os_main_stack+4008>:    0x00000073      0x200041d3      
>>> 0x00000000      0x0000ede9
>>> (gdb) p &__text
>>> $3 = (<data variable, no debug info> *) 0x8020 <__isr_vector>
>>> (gdb) p &__etext
>>> $4 = (<data variable, no debug info> *) 0x175f0
>>> (gdb) x/i 0x00017161
>>> 0x17161:    movs    r0, r0
>>> (gdb) x/i 0x00014949
>>> 0x14949 <crash_device+12>:  cbz     r0, 0x1496a <crash_device+46>
>>> (gdb) x/i 0x00014978
>>> 0x14978 <crash_device+60>:  sdiv    r3, r3, r2
>>> (gdb) x/i 0x00014a21
>>> 0x14a21 <crash_cli_cmd+12>: cbz     r0, 0x14a28 <crash_cli_cmd+20>
>>> (gdb) x/i 0x00014a15
>>> 0x14a15 <crash_cli_cmd>:    push    {r3, lr}
>>> (gdb) list *0x14949
>>> 0x14949 is in crash_device 
>>> (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42).
>>> warning: Source file is more recent than executable.
>>> 37  int
>>> 38  crash_device(char *how)
>>> 39  {
>>> 40      volatile int val1, val2, val3;
>>> 41  
>>> 42      if (!strcmp(how, "div0")) {
>>> 43  
>>> 44          val1 = 42;
>>> 45          val2 = 0;
>>> 46  
>>> (gdb) list *0x00014a21
>>> 0x14a21 is in crash_cli_cmd 
>>> (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>>> 36  };
>>> 37  
>>> 38  static int
>>> 39  crash_cli_cmd(int argc, char **argv)
>>> 40  {
>>> 41      if (argc >= 2 && crash_device(argv[1]) == 0) {
>>> 42          return 0;
>>> 43      }
>>> 44      console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>>> 45      return 0;
>>> (gdb) list *0x14a21
>>> 0x14a21 is in crash_cli_cmd 
>>> (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>>> 36  };
>>> 37  
>>> 38  static int
>>> 39  crash_cli_cmd(int argc, char **argv)
>>> 40  {
>>> 41      if (argc >= 2 && crash_device(argv[1]) == 0) {
>>> 42          return 0;
>>> 43      }
>>> 44      console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>>> 45      return 0;
>>> 
>>> good luck.
>>> 
>>>> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <[email protected]> 
>>>> wrote:
>>>> 
>>>> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in 
>>>> the release.
>>>> 
>>>> If I change the release, I believe there would be many API changes to be 
>>>> done on MESH side.
>>>> 
>>>> Can you guide me on how to "manually walk the stack for looking for things 
>>>> which look like pointers to text” ?
>>>> 
>>>> My gdb skill are pretty weak.
>>>> 
>>>> I tried gdb where, with the following outcome.
>>>> 
>>>> (gdb) c
>>>> Continuing.
>>>> 
>>>> 
>>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>>> hal_system_reset () at 
>>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>> 50             asm("bkpt");
>>>> (gdb) 
>>>> Continuing.
>>>> 
>>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>>> hal_system_reset () at 
>>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>> 50             asm("bkpt");
>>>> (gdb) where
>>>> #0  hal_system_reset () at 
>>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at 
>>>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>>>> #2  0x0000da56 in os_default_irq_asm () at 
>>>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>>> #3  <signal handler called>
>>>> #4  0x00000000 in ?? ()
>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>> 
>>>> 
>>>> 
>>>>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi !
>>>>>> 
>>>>>> Am having an issue with Sending and Receiving a Mesh Message. Though am 
>>>>>> positive the problem is more towards releasing the semaphore.
>>>>>> 
>>>>>> Action Received over MESH Length :- 15
>>>>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>>>>>> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
>>>>>> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
>>>>>> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>>>>>> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
>>>>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>>>>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>>>>> 
>>>>>> Am sending a group mesh message for testing. The sequence of events are 
>>>>>> as follows.
>>>>>> 
>>>>>> Button TASK -> send message over MESH -> Mesh receives message on model 
>>>>>> -> copies the data and starts releases the Semaphore for another task -> 
>>>>>> LOG Task starts and logs the message.
>>>>>> 
>>>>>> In this entire flow, the moment I receive the message and release the 
>>>>>> semaphore the firmware crashes.
>>>>>> 
>>>>>> I tried increasing the STACK size of the LOG task, however that didn’t 
>>>>>> help.
>>>>>> 
>>>>>> Could someone let me know how to understand where / why the crash is 
>>>>>> happening ?
>>>>> 
>>>>> Looking at your registers they seem to be garbage, so I’m guessing stack
>>>>> corruption of some sort; does not have to be overflow.
>>>>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for 
>>>>> looking for things which
>>>>> look like pointers to text.

Re: Mynewt crash when releasing semaphore

Reply via email to