Some suggestions (inline).
> On Aug 31, 2018, at 2:32 PM, Aditya Xavier <[email protected]>
> wrote:
>
> Gosh, this doesn’t make much sense to me :(
>
> (gdb) monitor go
> (gdb) monitor reset
> Resetting target
> (gdb) c
> Continuing.
>
> Program received signal SIGTRAP, Trace/breakpoint trap.
> hal_system_reset () at
> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> 50 asm("bkpt");
> (gdb) bt
> #0 hal_system_reset () at
> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> #1 0x0000bf2e in os_default_irq (tf=0x2000ffc8) at
> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
> #2 0x0000da56 in os_default_irq_asm () at
> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
> #3 <signal handler called>
> #4 0x00000000 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb) frame 1
> #1 0x0000bf2e in os_default_irq (tf=0x2000ffc8) at
> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
> 170 hal_system_reset();
> (gdb) p/x *tf
> $1 = {ef = 0x2000abd0, r4 = 0x1b000000, r5 = 0x2000acc0, r6 = 0x2000aca0, r7
> = 0x7, r8 = 0x0, r9 = 0x200008a9, r10 = 0x2000ad28, r11 = 0x11d91, lr =
> 0xfffffffd}
> (gdb) p/x *tf->ef
> $2 = {r0 = 0xd7229882, r1 = 0xd929b3bb, r2 = 0xcf0f98cb, r3 = 0x5c5a76b3, r12
> = 0x681af5c8, lr = 0xb1334673, pc = 0x7e3cdeb8, psr = 0x2266a80b}
> (gdb) x/32x 0xd7229882
> 0xd7229882: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xd7229892: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xd72298a2: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xd72298b2: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xd72298c2: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xd72298d2: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xd72298e2: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xd72298f2: 0x00000000 0x00000000 0x00000000 0x00000000
> (gdb) x/32x 0x2000abd0
> 0x2000abd0: 0xd7229882 0xd929b3bb 0xcf0f98cb 0x5c5a76b3
> 0x2000abe0: 0x681af5c8 0xb1334673 0x7e3cdeb8 0x2266a80b
> 0x2000abf0: 0x59d8de5b 0xe8eb9828 0x96d74690 0xb4b1ee9b
> 0x2000ac00: 0x95f0cad6 0x7d1b52fe 0xebcc146e 0x5f7dfaf5
> 0x2000ac10: 0x62dd2c19 0x1fc67ee7 0xf40a6a89 0xab77907c
^^^^^ looks bad, especially the top area. Should have dump of registers
stored at the time the crash.
> 0x2000ac20: 0x00000010 0x00039c74 0x2000ad28 0x0002329f
> 0x2000ac30: 0xd87c5730 0xa203a288 0x00000010 0x00039c74
> 0x2000ac40: 0x2000ad28 0x00023485 0x00000000 0x00000000
> (gdb) p &__text
> No symbol "__text" in current context.
> (gdb) p &__etext
> $3 = (<data variable, no debug info> *) 0x3a9c8
> (gdb) p &__text
> No symbol "__text" in current context.
This was probably added at the same time as OS_STACK_BACKTRACE.
You’re looking for values between start of your image slot and 0x3a9c8.
> (gdb) x/i 0xd7229882
> 0xd7229882: movs r0, r0
> (gdb) list *0xd7229882
> (gdb) x/i 0x681af5c8
> 0x681af5c8: movs r0, r0
> (gdb) x/i 0x59d8de5b
> 0x59d8de5b: movs r0, r0
> (gdb) x/i 0x62dd2c19
> 0x62dd2c19: movs r0, r0
> (gdb) x/i 0x2000ad28
> 0x2000ad28: lsls r0, r2, #6
> (gdb) x/i 0x1fc67ee7
> 0x1fc67ee7: movs r0, r0
> (gdb) x/i 0xa203a288
> 0xa203a288: movs r0, r0
> (gdb) x/i 0xe8eb9828
> 0xe8eb9828: movs r0, r0
> (gdb) x/i 0xcf0f98cb
> 0xcf0f98cb: movs r0, r0
> (gdb) x/i 0x96d74690
> 0x96d74690: movs r0, r0
> (gdb) x/i 0xf40a6a89
> 0xf40a6a89: movs r0, r0
> (gdb) x/i 0x2000ad28
> 0x2000ad28: lsls r0, r2, #6
> (gdb) x/i 0x00000010
> 0x10: movs r0, r0
> (gdb) x/i 0x0002329f
> 0x2329f <shift_rows+108>: add sp, #20
> (gdb) x/i 0x00039c74
> 0x39c74 <sbox>: ldrb r3, [r4, #17]
> (gdb) x/i 0xa203a288
> 0xa203a288: movs r0, r0
> (gdb) x/i 0x0002329f
> 0x2329f <shift_rows+108>: add sp, #20
> (gdb) list *0x0002329f
> 0x2329f is in shift_rows
> (repos/apache-mynewt-core/crypto/tinycrypt/src/aes_encrypt.c:156).
> 151 t[0] = s[0]; t[1] = s[5]; t[2] = s[10]; t[3] = s[15];
> 152 t[4] = s[4]; t[5] = s[9]; t[6] = s[14]; t[7] = s[3];
> 153 t[8] = s[8]; t[9] = s[13]; t[10] = s[2]; t[11] = s[7];
> 154 t[12] = s[12]; t[13] = s[1]; t[14] = s[6]; t[15] = s[11];
> 155 (void) _copy(s, sizeof(t), t, sizeof(t));
> 156 }
> 157
> 158 int tc_aes_encrypt(uint8_t *out, const uint8_t *in, const
> TCAesKeySched_t s)
> 159 {
> 160 uint8_t state[Nk*Nb];
That could be writing that random looking data in the stack. encrypted data
should
look like gibberish.
Follow the stack a bit further starting continuing from 0x2000ac50. See if you
find who called it. I’m hazarding a guess that one of those args passed to
aes_encrypt()
is pointing to stack, and there’s not enough memory allocated to hold that data.
>> On 31-Aug-2018, at 4:46 PM, marko kiiskila <[email protected]> wrote:
>>
>> Sure. Something like this:
>>
>> 000933 compat> crash div0
>> crash div0
>> 003157 Unhandled interrupt (3), exception sp 0x20001dd8
>> 003157 r0:0x00000000 r1:0x00017161 r2:0x00000000 r3:0x0000002a
>> 003157 r4:0x200041d6 r5:0x00000000 r6:0x20000318 r7:0x00000000
>> 003157 r8:0x00000000 r9:0x00000000 r10:0x00000000 r11:0x00000000
>> 003157 r12:0x00000000 lr:0x00014949 pc:0x00014978 psr:0x61000000
>> 003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000
>> 003157 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>
>> Then from gdb:
>>
>> Program received signal SIGTRAP, Trace/breakpoint trap.
>> hal_system_reset ()
>> at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> 50 asm("bkpt");
>> (gdb) bt
>> #0 hal_system_reset ()
>> at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> #1 0x00008be8 in os_default_irq (tf=0x2000ffc0)
>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>> #2 0x0000a5b6 in os_default_irq_asm ()
>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>> #3 <signal handler called>
>> #4 0x00000000 in ?? ()
>> #5 0x0000812c in Reset_Handler ()
>> at
>> repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> (gdb) frame 1
>> #1 0x00008be8 in os_default_irq (tf=0x2000ffc0)
>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>> 171 hal_system_reset();
>> (gdb) p/x *tf
>> $1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = 0x0,
>> r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd}
>> (gdb) p/x *tf->ef
>> $2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949,
>> pc = 0x14978, psr = 0x61000000}
>> (gdb) x/32x 0x20001dd8
>> 0x20001dd8 <os_main_stack+3896>: 0x00000000 0x00017161
>> 0x00000000 0x0000002a
>> 0x20001de8 <os_main_stack+3912>: 0x00000000 0x00014949
>> 0x00014978 0x61000000
>> 0x20001df8 <os_main_stack+3928>: 0x00000003 0x00000000
>> 0x00000000 0x0000002a
>> 0x20001e08 <os_main_stack+3944>: 0x00000001 0x00000002
>> 0x0000000a 0x00014a21
>> 0x20001e18 <os_main_stack+3960>: 0x00014a15 0x0000ebd9
>> 0x00000000 0x200041d0
>> 0x20001e28 <os_main_stack+3976>: 0x200041d6 0x00000000
>> 0x0000000a 0x0001574d
>> 0x20001e38 <os_main_stack+3992>: 0x00015741 0x0000c925
>> 0x200041d0 0x00000011
>> 0x20001e48 <os_main_stack+4008>: 0x00000073 0x200041d3
>> 0x00000000 0x0000ede9
>> (gdb) p &__text
>> $3 = (<data variable, no debug info> *) 0x8020 <__isr_vector>
>> (gdb) p &__etext
>> $4 = (<data variable, no debug info> *) 0x175f0
>> (gdb) x/i 0x00017161
>> 0x17161: movs r0, r0
>> (gdb) x/i 0x00014949
>> 0x14949 <crash_device+12>: cbz r0, 0x1496a <crash_device+46>
>> (gdb) x/i 0x00014978
>> 0x14978 <crash_device+60>: sdiv r3, r3, r2
>> (gdb) x/i 0x00014a21
>> 0x14a21 <crash_cli_cmd+12>: cbz r0, 0x14a28 <crash_cli_cmd+20>
>> (gdb) x/i 0x00014a15
>> 0x14a15 <crash_cli_cmd>: push {r3, lr}
>> (gdb) list *0x14949
>> 0x14949 is in crash_device
>> (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42).
>> warning: Source file is more recent than executable.
>> 37 int
>> 38 crash_device(char *how)
>> 39 {
>> 40 volatile int val1, val2, val3;
>> 41
>> 42 if (!strcmp(how, "div0")) {
>> 43
>> 44 val1 = 42;
>> 45 val2 = 0;
>> 46
>> (gdb) list *0x00014a21
>> 0x14a21 is in crash_cli_cmd
>> (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>> 36 };
>> 37
>> 38 static int
>> 39 crash_cli_cmd(int argc, char **argv)
>> 40 {
>> 41 if (argc >= 2 && crash_device(argv[1]) == 0) {
>> 42 return 0;
>> 43 }
>> 44 console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>> 45 return 0;
>> (gdb) list *0x14a21
>> 0x14a21 is in crash_cli_cmd
>> (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>> 36 };
>> 37
>> 38 static int
>> 39 crash_cli_cmd(int argc, char **argv)
>> 40 {
>> 41 if (argc >= 2 && crash_device(argv[1]) == 0) {
>> 42 return 0;
>> 43 }
>> 44 console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>> 45 return 0;
>>
>> good luck.
>>
>>> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <[email protected]>
>>> wrote:
>>>
>>> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in
>>> the release.
>>>
>>> If I change the release, I believe there would be many API changes to be
>>> done on MESH side.
>>>
>>> Can you guide me on how to "manually walk the stack for looking for things
>>> which look like pointers to text” ?
>>>
>>> My gdb skill are pretty weak.
>>>
>>> I tried gdb where, with the following outcome.
>>>
>>> (gdb) c
>>> Continuing.
>>>
>>>
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset () at
>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50 asm("bkpt");
>>> (gdb)
>>> Continuing.
>>>
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset () at
>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50 asm("bkpt");
>>> (gdb) where
>>> #0 hal_system_reset () at
>>> repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> #1 0x0000bf2e in os_default_irq (tf=0x2000ffc8) at
>>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>>> #2 0x0000da56 in os_default_irq_asm () at
>>> repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>> #3 <signal handler called>
>>> #4 0x00000000 in ?? ()
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>
>>>
>>>
>>>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hi !
>>>>>
>>>>> Am having an issue with Sending and Receiving a Mesh Message. Though am
>>>>> positive the problem is more towards releasing the semaphore.
>>>>>
>>>>> Action Received over MESH Length :- 15
>>>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>>>>> 012273 r0:0xd7229882 r1:0xd929b3bb r2:0xcf0f98cb r3:0x5c5a76b3
>>>>> 012273 r4:0x1b000000 r5:0x2000acc0 r6:0x2000aca0 r7:0x00000008
>>>>> 012273 r8:0x00000000 r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>>>>> 012273 r12:0x681af5c8 lr:0xb1334673 pc:0x7e3cdeb8 psr:0x2266a80b
>>>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>>>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>>>>
>>>>> Am sending a group mesh message for testing. The sequence of events are
>>>>> as follows.
>>>>>
>>>>> Button TASK -> send message over MESH -> Mesh receives message on model
>>>>> -> copies the data and starts releases the Semaphore for another task ->
>>>>> LOG Task starts and logs the message.
>>>>>
>>>>> In this entire flow, the moment I receive the message and release the
>>>>> semaphore the firmware crashes.
>>>>>
>>>>> I tried increasing the STACK size of the LOG task, however that didn’t
>>>>> help.
>>>>>
>>>>> Could someone let me know how to understand where / why the crash is
>>>>> happening ?
>>>>
>>>> Looking at your registers they seem to be garbage, so I’m guessing stack
>>>> corruption of some sort; does not have to be overflow.
>>>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking
>>>> for things which
>>>> look like pointers to text.
>>>>
>>>>
>>>
>>
>