Re: Mynewt crash when releasing semaphore

marko kiiskila Fri, 31 Aug 2018 04:52:02 -0700

Some suggestions (inline).

> On Aug 31, 2018, at 2:32 PM, Aditya Xavier <[email protected]> 
> wrote:
> 
> Gosh, this doesn’t make much sense to me :(
> 
> (gdb) monitor go
> (gdb) monitor reset
> Resetting target
> (gdb) c
> Continuing.
> 
> Program received signal SIGTRAP, Trace/breakpoint trap.
> hal_system_reset () at 
> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> 50                asm("bkpt");
> (gdb) bt
> #0  hal_system_reset () at 
> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at 
> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
> #2  0x0000da56 in os_default_irq_asm () at 
> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
> #3  <signal handler called>
> #4  0x00000000 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb) frame 1
> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at 
> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
> 170       hal_system_reset();
> (gdb) p/x *tf
> $1 = {ef = 0x2000abd0, r4 = 0x1b000000, r5 = 0x2000acc0, r6 = 0x2000aca0, r7 
> = 0x7, r8 = 0x0, r9 = 0x200008a9, r10 = 0x2000ad28, r11 = 0x11d91, lr = 
> 0xfffffffd}
> (gdb) p/x *tf->ef
> $2 = {r0 = 0xd7229882, r1 = 0xd929b3bb, r2 = 0xcf0f98cb, r3 = 0x5c5a76b3, r12 
> = 0x681af5c8, lr = 0xb1334673, pc = 0x7e3cdeb8, psr = 0x2266a80b}
> (gdb) x/32x 0xd7229882
> 0xd7229882:   0x00000000      0x00000000      0x00000000      0x00000000
> 0xd7229892:   0x00000000      0x00000000      0x00000000      0x00000000
> 0xd72298a2:   0x00000000      0x00000000      0x00000000      0x00000000
> 0xd72298b2:   0x00000000      0x00000000      0x00000000      0x00000000
> 0xd72298c2:   0x00000000      0x00000000      0x00000000      0x00000000
> 0xd72298d2:   0x00000000      0x00000000      0x00000000      0x00000000
> 0xd72298e2:   0x00000000      0x00000000      0x00000000      0x00000000
> 0xd72298f2:   0x00000000      0x00000000      0x00000000      0x00000000
> (gdb) x/32x 0x2000abd0
> 0x2000abd0:   0xd7229882      0xd929b3bb      0xcf0f98cb      0x5c5a76b3
> 0x2000abe0:   0x681af5c8      0xb1334673      0x7e3cdeb8      0x2266a80b
> 0x2000abf0:   0x59d8de5b      0xe8eb9828      0x96d74690      0xb4b1ee9b
> 0x2000ac00:   0x95f0cad6      0x7d1b52fe      0xebcc146e      0x5f7dfaf5
> 0x2000ac10:   0x62dd2c19      0x1fc67ee7      0xf40a6a89      0xab77907c


^^^^^ looks bad, especially the top area. Should have dump of registers
stored at the time the crash.


> 0x2000ac20:   0x00000010      0x00039c74      0x2000ad28      0x0002329f
> 0x2000ac30:   0xd87c5730      0xa203a288      0x00000010      0x00039c74
> 0x2000ac40:   0x2000ad28      0x00023485      0x00000000      0x00000000
> (gdb) p &__text
> No symbol "__text" in current context.
> (gdb)  p &__etext
> $3 = (<data variable, no debug info> *) 0x3a9c8
> (gdb) p &__text
> No symbol "__text" in current context.

This was probably added at the same time as OS_STACK_BACKTRACE.
You’re looking for values between start of your image slot and 0x3a9c8.

> (gdb) x/i 0xd7229882
>   0xd7229882: movs    r0, r0
> (gdb) list *0xd7229882
> (gdb) x/i 0x681af5c8
>   0x681af5c8: movs    r0, r0
> (gdb) x/i 0x59d8de5b
>   0x59d8de5b: movs    r0, r0
> (gdb) x/i 0x62dd2c19
>   0x62dd2c19: movs    r0, r0
> (gdb) x/i 0x2000ad28
>   0x2000ad28: lsls    r0, r2, #6
> (gdb) x/i 0x1fc67ee7
>   0x1fc67ee7: movs    r0, r0
> (gdb) x/i 0xa203a288
>   0xa203a288: movs    r0, r0
> (gdb) x/i 0xe8eb9828
>   0xe8eb9828: movs    r0, r0
> (gdb) x/i 0xcf0f98cb
>   0xcf0f98cb: movs    r0, r0
> (gdb) x/i 0x96d74690
>   0x96d74690: movs    r0, r0
> (gdb) x/i 0xf40a6a89
>   0xf40a6a89: movs    r0, r0
> (gdb) x/i 0x2000ad28
>   0x2000ad28: lsls    r0, r2, #6
> (gdb) x/i 0x00000010
>   0x10:       movs    r0, r0
> (gdb) x/i 0x0002329f
>   0x2329f <shift_rows+108>:   add     sp, #20
> (gdb) x/i 0x00039c74
>   0x39c74 <sbox>:     ldrb    r3, [r4, #17]
> (gdb) x/i 0xa203a288
>   0xa203a288: movs    r0, r0
> (gdb) x/i 0x0002329f
>   0x2329f <shift_rows+108>:   add     sp, #20
> (gdb) list *0x0002329f
> 0x2329f is in shift_rows 
> (repos/apache-mynewt-core/crypto/tinycrypt/src/aes_encrypt.c:156).
> 151           t[0]  = s[0]; t[1] = s[5]; t[2] = s[10]; t[3] = s[15];
> 152           t[4]  = s[4]; t[5] = s[9]; t[6] = s[14]; t[7] = s[3];
> 153           t[8]  = s[8]; t[9] = s[13]; t[10] = s[2]; t[11] = s[7];
> 154           t[12] = s[12]; t[13] = s[1]; t[14] = s[6]; t[15] = s[11];
> 155           (void) _copy(s, sizeof(t), t, sizeof(t));
> 156   }
> 157   
> 158   int tc_aes_encrypt(uint8_t *out, const uint8_t *in, const 
> TCAesKeySched_t s)
> 159   {
> 160           uint8_t state[Nk*Nb];

That could be writing that random looking data in the stack. encrypted data 
should
look like gibberish.
Follow the stack a bit further starting continuing from 0x2000ac50. See if you
find who called it. I’m hazarding a guess that one of those args passed to 
aes_encrypt()
is pointing to stack, and there’s not enough memory allocated to hold that data.


>> On 31-Aug-2018, at 4:46 PM, marko kiiskila <[email protected]> wrote:
>> 
>> Sure. Something like this:
>> 
>> 000933 compat> crash div0
>> crash div0
>> 003157 Unhandled interrupt (3), exception sp 0x20001dd8
>> 003157  r0:0x00000000  r1:0x00017161  r2:0x00000000  r3:0x0000002a
>> 003157  r4:0x200041d6  r5:0x00000000  r6:0x20000318  r7:0x00000000
>> 003157  r8:0x00000000  r9:0x00000000 r10:0x00000000 r11:0x00000000
>> 003157 r12:0x00000000  lr:0x00014949  pc:0x00014978 psr:0x61000000
>> 003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000
>> 003157 BFAR:0xe000ed38 MMFAR:0xe000ed34
>> 
>> Then from gdb:
>> 
>> Program received signal SIGTRAP, Trace/breakpoint trap.
>> hal_system_reset ()
>>   at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> 50               asm("bkpt");
>> (gdb) bt
>> #0  hal_system_reset ()
>>   at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>   at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>> #2  0x0000a5b6 in os_default_irq_asm ()
>>   at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>> #3  <signal handler called>
>> #4  0x00000000 in ?? ()
>> #5  0x0000812c in Reset_Handler ()
>>   at 
>> repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> (gdb) frame 1
>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>   at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>> 171      hal_system_reset();
>> (gdb) p/x *tf
>> $1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = 0x0, 
>> r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd}
>> (gdb) p/x *tf->ef
>> $2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949, 
>> pc = 0x14978, psr = 0x61000000}
>> (gdb) x/32x 0x20001dd8
>> 0x20001dd8 <os_main_stack+3896>:     0x00000000      0x00017161      
>> 0x00000000      0x0000002a
>> 0x20001de8 <os_main_stack+3912>:     0x00000000      0x00014949      
>> 0x00014978      0x61000000
>> 0x20001df8 <os_main_stack+3928>:     0x00000003      0x00000000      
>> 0x00000000      0x0000002a
>> 0x20001e08 <os_main_stack+3944>:     0x00000001      0x00000002      
>> 0x0000000a      0x00014a21
>> 0x20001e18 <os_main_stack+3960>:     0x00014a15      0x0000ebd9      
>> 0x00000000      0x200041d0
>> 0x20001e28 <os_main_stack+3976>:     0x200041d6      0x00000000      
>> 0x0000000a      0x0001574d
>> 0x20001e38 <os_main_stack+3992>:     0x00015741      0x0000c925      
>> 0x200041d0      0x00000011
>> 0x20001e48 <os_main_stack+4008>:     0x00000073      0x200041d3      
>> 0x00000000      0x0000ede9
>> (gdb) p &__text
>> $3 = (<data variable, no debug info> *) 0x8020 <__isr_vector>
>> (gdb) p &__etext
>> $4 = (<data variable, no debug info> *) 0x175f0
>> (gdb) x/i 0x00017161
>>  0x17161:    movs    r0, r0
>> (gdb) x/i 0x00014949
>>  0x14949 <crash_device+12>:  cbz     r0, 0x1496a <crash_device+46>
>> (gdb) x/i 0x00014978
>>  0x14978 <crash_device+60>:  sdiv    r3, r3, r2
>> (gdb) x/i 0x00014a21
>>  0x14a21 <crash_cli_cmd+12>: cbz     r0, 0x14a28 <crash_cli_cmd+20>
>> (gdb) x/i 0x00014a15
>>  0x14a15 <crash_cli_cmd>:    push    {r3, lr}
>> (gdb) list *0x14949
>> 0x14949 is in crash_device 
>> (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42).
>> warning: Source file is more recent than executable.
>> 37   int
>> 38   crash_device(char *how)
>> 39   {
>> 40       volatile int val1, val2, val3;
>> 41   
>> 42       if (!strcmp(how, "div0")) {
>> 43   
>> 44           val1 = 42;
>> 45           val2 = 0;
>> 46   
>> (gdb) list *0x00014a21
>> 0x14a21 is in crash_cli_cmd 
>> (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>> 36   };
>> 37   
>> 38   static int
>> 39   crash_cli_cmd(int argc, char **argv)
>> 40   {
>> 41       if (argc >= 2 && crash_device(argv[1]) == 0) {
>> 42           return 0;
>> 43       }
>> 44       console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>> 45       return 0;
>> (gdb) list *0x14a21
>> 0x14a21 is in crash_cli_cmd 
>> (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>> 36   };
>> 37   
>> 38   static int
>> 39   crash_cli_cmd(int argc, char **argv)
>> 40   {
>> 41       if (argc >= 2 && crash_device(argv[1]) == 0) {
>> 42           return 0;
>> 43       }
>> 44       console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>> 45       return 0;
>> 
>> good luck.
>> 
>>> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <[email protected]> 
>>> wrote:
>>> 
>>> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in 
>>> the release.
>>> 
>>> If I change the release, I believe there would be many API changes to be 
>>> done on MESH side.
>>> 
>>> Can you guide me on how to "manually walk the stack for looking for things 
>>> which look like pointers to text” ?
>>> 
>>> My gdb skill are pretty weak.
>>> 
>>> I tried gdb where, with the following outcome.
>>> 
>>> (gdb) c
>>> Continuing.
>>> 
>>> 
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset () at 
>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50              asm("bkpt");
>>> (gdb) 
>>> Continuing.
>>> 
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset () at 
>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50              asm("bkpt");
>>> (gdb) where
>>> #0  hal_system_reset () at 
>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at 
>>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>>> #2  0x0000da56 in os_default_irq_asm () at 
>>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>> #3  <signal handler called>
>>> #4  0x00000000 in ?? ()
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> 
>>> 
>>> 
>>>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <[email protected]> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> Hi !
>>>>> 
>>>>> Am having an issue with Sending and Receiving a Mesh Message. Though am 
>>>>> positive the problem is more towards releasing the semaphore.
>>>>> 
>>>>> Action Received over MESH Length :- 15
>>>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>>>>> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
>>>>> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
>>>>> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>>>>> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
>>>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>>>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>>>> 
>>>>> Am sending a group mesh message for testing. The sequence of events are 
>>>>> as follows.
>>>>> 
>>>>> Button TASK -> send message over MESH -> Mesh receives message on model 
>>>>> -> copies the data and starts releases the Semaphore for another task -> 
>>>>> LOG Task starts and logs the message.
>>>>> 
>>>>> In this entire flow, the moment I receive the message and release the 
>>>>> semaphore the firmware crashes.
>>>>> 
>>>>> I tried increasing the STACK size of the LOG task, however that didn’t 
>>>>> help.
>>>>> 
>>>>> Could someone let me know how to understand where / why the crash is 
>>>>> happening ?
>>>> 
>>>> Looking at your registers they seem to be garbage, so I’m guessing stack
>>>> corruption of some sort; does not have to be overflow.
>>>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking 
>>>> for things which
>>>> look like pointers to text.
>>>> 
>>>> 
>>> 
>> 
>

Re: Mynewt crash when releasing semaphore

Reply via email to