Re: newtmgr image upload nrf51dk disconnects with reason=8

Jacob Rosenthal Thu, 20 Apr 2017 11:17:10 -0700

Indeed the disconnect is a result of the erase. If I comment that out I can
get to a stack overflow


the newt tool uses 56 for first packet and 64 after, not sure why yet, but
lets just say I hardcode 56 in my node tool

bleprph and blesplit have
OS_MAIN_STACK_SIZE: 428

oddly enough, has to be only 8 more to work OS_MAIN_STACK_SIZE: 436

though could probably use more overhead than that.

Thoughts on what to do about flash erase disconnecting?


On Thu, Apr 20, 2017 at 10:39 AM, Alan Graves <[email protected]>
wrote:

> Reminds me of that old Wendy's TV commercial:
> "Where's the deadbeef?"
>
> ALan
>
> -----Original Message-----
> From: marko kiiskila [mailto:[email protected]]
> Sent: Wednesday, April 19, 2017 5:00 PM
> To: [email protected]
> Subject: Re: newtmgr image upload nrf51dk disconnects with reason=8
>
>
> > On Apr 19, 2017, at 4:33 PM, Jacob Rosenthal <[email protected]>
> wrote:
> >
> > On Wed, Apr 19, 2017 at 11:19 AM, marko kiiskila <[email protected]>
> wrote:
> >
> >> Just general comments, I hope I’m not saying things which are too
> >> obvious.
> >>
> > More specific would be even better :) I dont think my gdb is up to par
> >
> > Either g_os_run_list or one of the task structures is getting smashed.
> >> As you know you tasks beforehand, you can walk through them manually
> >> to figure which one it is.
> >>
> > How do I know the tasks beforehand? I would guess something in
> > imgr_upload is corrupting it? So print as that function starts and
> > ends? How do I walk through them manually?
>
> You could do this, for example:
>
> (gdb) source repos/apache-mynewt-core/compiler/gdbmacros/os.gdb
> (gdb) os_tasks
>  prio state      stack  stksz       task name
> * 255   0x1    0xae7d4  16384    0x9e780 idle
>   127   0x2    0x9b128   5376    0x95cd8 main
>     0   0x2    0x95a2c  16384    0x859dc uartpoll
>     2   0x2    0xb4338   4096    0x9d1d0 socket
>     9   0x2    0x85908   4096    0x818a8 ble_hs
>
> This was from native build target I happened to have debugger on with it.
> But you would get the same type of data from actual targets as well.
>
> The pointer to os_task structure is under the ‘task’ column. Here I'm
> picking the idle task for closer inspection:
>
> (gdb) set print pretty
> (gdb) p *(struct os_task *)0x9e780
> $3 = {
>   t_stackptr = 0xae63c <g_idle_task_stack+65128>,
>   t_stacktop = 0xae7d4 <g_os_idle_ctr>,
>   t_stacksize = 16384,
>   t_taskid = 0 '\000',
>   t_prio = 255 '\377',
>   t_state = 1 '\001',
>   t_flags = 0 '\000',
>   t_lockcnt = 0 '\000',
>   t_pad = 0 '\000',
>   t_name = 0x6b8d8 "idle",
>   t_func = 0x192f0 <os_idle_task>,
>   t_arg = 0x0,
>   t_obj = 0x0,
>   t_sanity_check = {
>     sc_checkin_last = 0,
>     sc_checkin_itvl = 0,
>     sc_func = 0x0,
>     sc_arg = 0x0,
>     sc_next = {
>       sle_next = 0x0
>     }
>   },
>   t_next_wakeup = 0,
>   t_run_time = 52837,
>   t_ctx_sw_cnt = 50124,
>   t_os_task_list = {
>     stqe_next = 0x95cd8 <os_main_task>
>   },
>   t_os_list = {
>     tqe_next = 0x0,
>     tqe_prev = 0x8143c <g_os_run_list>
>   },
>   t_obj_list = {
>     sle_next = 0x0
>   }
> }
>
> And then I’ll compute where the task stack starts, t_stacktop -
> sizeof(os_stack_t) * t_stacksize
>
> (gdb) x/x 0xae7d4-16384*4
> 0x9e7d4 <g_idle_task_stack>:    0xdeadbeef
>
> So that’s where the stack starts. Then I’ll inspect the stack top, see if
> it still has the fill pattern ‘0xdeadbeef'
>
> (gdb) x/32x 0x9e7d4
> 0x9e7d4 <g_idle_task_stack>:    0xdeadbeef      0xdeadbeef
> 0xdeadbeef      0xdeadbeef
> 0x9e7e4 <g_idle_task_stack+16>: 0xdeadbeef      0xdeadbeef
> 0xdeadbeef      0xdeadbeef
> 0x9e7f4 <g_idle_task_stack+32>: 0xdeadbeef      0xdeadbeef
> 0xdeadbeef      0xdeadbeef
> 0x9e804 <g_idle_task_stack+48>: 0xdeadbeef      0xdeadbeef
> 0xdeadbeef      0xdeadbeef
> 0x9e814 <g_idle_task_stack+64>: 0xdeadbeef      0xdeadbeef
> 0xdeadbeef      0xdeadbeef
>
> So this stack has not been used completely.
>
> >
> >>
> >> Usually culprit is stack overflow, so once you find out which task
> >> structure is being corrupt, look for the stack just after that in
> >> memory.
> >>
> >> nm output piped to sort is your friend in locating that stack.
> >>
> > nm output?
>
> [pi@raspberrypi:~/src/incubator-mynewt-blinky]$ arm-linux-gnueabihf-nm
> bin/targets/bleprph_oic_linux/app/apps/bleprph_oic/bleprph_oic.elf | sort
> | more
>
> I.e. get symbols from my elf-file, sort them by address.
> And then let’s continue what the idle stack would overwrite to, if it was
> not big enough:
>
> ...
> 0009e780 B g_idle_task
> 0009e7d0 B g_os_started
> 0009e7d4 B g_idle_task_stack
>
> looks like idle stack overflow would most likely corrupt those 2 items
> first. And if it corrupts that task structure, it’s game over.
>
> BTW, gdb scripts looking for task stack use are missing. We probably
> should have such :)
>
> Happy hacking,
> M
>
> >>
> >>
> > Hope this helps,
> >> M
> >>
> >
> > Thanks for the help
>
>

Re: newtmgr image upload nrf51dk disconnects with reason=8

Reply via email to