Nathan,

What's strange is that same master source (nuttx hash e83606732d5e71eb98a9eb544537dbbeb71aa58b, apps hash d48b45000d1d083082f7a1650f351573c36a87d0) with INIT_STACKSIZE=2048 in the default .config fails on nucleo-h743zi2 but passes on nucleo-h743zi2(run on my nuclo-h753zi board) when I try "time ls".  I turned on all the stack checks just to be sure nuclo-f446re wasn't just "lucky".

On 2/7/26 23:54, Nathan Hartman wrote:
Yeah, it's usually the stack, but does anyone know why it needs to be enlarged now? Is something using more stack than before?

On Sat, Feb 7, 2026 at 5:28 PM Peter Barada <[email protected]> wrote:

    Cranking up CONFIG_INIT_STACKSIZE to 3072 fixes the issue.

    I tried enabling STACK_COLORATION, STACK_USAGE, and ARMV7M_STACKTRACE
    while leaving INIT_STACKSIZE at 2048 to hopefully and debug using
    STM32CubeIDE when I try "time ls" the GDB session is lost (which
    seems
    strange).

    If I then enable ARMV7M_STACKCHECK_BREAKPOINT GDB stops when it
    detects
    the stack overflow can get a call stack to understand why but can't
    continue(to show dump).

    Finally after enabling ARCH_STACKDUMP, ARMV7M_STACKCHECK,
    SCHED_BACKTRACE, STACK_COLORATION, STACK_USAGE, disable
    STACKCHECK_BREAKPOINT, and enable/set ARCH_INTERRUPTSTACK=2048, and
    ARCH_STACKDUMP_MAX_LENGTH=1024, I get a full dump when it detects
    stack
    overflow.

    Thanks for the help!


    On 2/7/26 03:25, raiden00pl wrote:
    > hi, this is a 100% stack issue. Increase all stack sizes to at
    least 4092.
    > Another option is to enable full optimisation with
    CONFIG_DEBUG_FULLOPT=y,
    > should also help.
    >
    > quick tip: about 80% of crashes in NuttX are stack issues, the
    first thing
    > you
    > always do when such crashes occur is to increase all stack sizes :)
    >
    > sob., 7 lut 2026 o 04:02 Matteo Golin <[email protected]>
    napisał(a):
    >
    >> I am not familiar enough, but there should be an option for
    stack canaries.
    >> I haven't had much luck with that configuration, and I imagine
    that your
    >> DEBUGASSERT will trigger before stack smashing is detected.
    >>
    >> Matteo
    >>
    >> On Fri, Feb 6, 2026, 8:45 PM Peter Barada
    <[email protected]> wrote:
    >>
    >>> Haven't tried yet(personally feel should know _why_ it
    happens) - is
    >> there
    >>> a config for compiling in stack checking on function entry?
    >>> On 2/6/26 20:22, Matteo Golin wrote:
    >>>
    >>> Hmmm, if the problem goes that far back it may not be worth
    triaging that
    >>> way. Things have probably diverged so much since then. No luck
    with the
    >>> stack increase?
    >>>
    >>> Matteo
    >>>
    >>> On Fri, Feb 6, 2026, 8:18 PM Peter Barada <[email protected]>
    >> wrote:
    >>>> Matteo,
    >>>>
    >>>> I'm walking back release points and have had to change board
    >>>> configuration names(to nucleo-h743zi), rename nuttx-apps to
    appa, and
    >> still
    >>>> seeing the fault in release/11.0 branch.
    >>>>
    >>>> I'm trying to go back further but wondering if I'll find a
    bisect start
    >>>> point...
    >>>> On 2/6/26 17:05, Matteo Golin wrote:
    >>>>
    >>>> Hi Peter,
    >>>>
    >>>> My approach is kind of a headache since bisecting over an
    area where
    >> apps
    >>>> and NuttX are not always in sync is a major limitation of the
    split
    >> repo.
    >>>> My approach is usually:
    >>>>
    >>>> - Start the bisect in kernel
    >>>> - Check the commit date of the current HEAD
    >>>> - Check out to a commit of the same/similar date in apps
    >>>> - Build
    >>>> - Mentally note if this commit was good or bad based on the
    results of
    >>>> running the image
    >>>> - make distclean (avoids artifacts carrying over between
    bisections and
    >>>> breaking everything)
    >>>> - Mark commit good or bad with git bisect
    >>>>
    >>>> Then basically repeat this until bisecting is finished. It
    sucks and I
    >>>> did suggest a script in /tools/ to try and automate most of
    this, but I
    >>>> never got around to writing it.
    >>>>
    >>>> I would suggest you start by checking for the issue on a
    stable release
    >>>> (i.e. 12.12.0) to see if that's a good commit you can start from.
    >> Usually
    >>>> those releases have a higher degree of testing because
    everyone who
    >> voted
    >>>> for the release ran some images on their hardware.
    >>>>
    >>>> That's honestly a lot of work but you never know if it'll end
    up being
    >>>> faster than trying to triage with logs!
    >>>>
    >>>> Matteo
    >>>>
    >>>> On Fri, Feb 6, 2026, 4:50 PM Nathan Hartman
    <[email protected]>
    >>>> wrote:
    >>>>
    >>>>> First place I would look: is the stack overflowing? (You
    could try
    >>>>> enabling some of the stack debugging features.)
    >>>>>
    >>>>> On Fri, Feb 6, 2026 at 4:34 PM Peter Barada
    <[email protected]>
    >>>>> wrote:
    >>>>>
    >>>>>> Matteo,
    >>>>>>
    >>>>>> I don't know if this was working before but if you can
    suggest a good
    >>>>>> starting point I can cycle through git bisect to narrow
    down to the
    >>>>>> failing commit.  What's the best approach to using git
    bisect across
    >>>>>> multiple repos (since changes in nuttx may have necessary
    changes in
    >>>>>> nuttx-apps and need to keep them in sync at each build point)?
    >>>>>>
    >>>>>> As an aside, I also I have a nucleo-f446re board 'time ls'
    works fine
    >>>>>> there.
    >>>>>>
    >>>>>> Further, does anyone have GDB scripts that make it easier
    to decipher
    >>>>>> Nuttx structures from memory (e.g. dump task/semaphore
    lists, etc)?
    >>>>>> I've
    >>>>>> started cobbling snippets but figure I'd ask before
    reinventing the
    >>>>>> wheel.
    >>>>>>
    >>>>>>
    >>>>>> On 2/6/26 16:12, Matteo Golin wrote:
    >>>>>>> Hi Peter,
    >>>>>>>
    >>>>>>> If you happen to know that this was working before on an
    older NuttX
    >>>>>>> version, you could use git bisect to narrow down the breaking
    >> commit.
    >>>>>>> Then the issue might be clearer.
    >>>>>>>
    >>>>>>> Best,
    >>>>>>> Matteo
    >>>>>>>
    >>>>>>> On Fri, Feb 6, 2026, 4:09 PM Peter Barada
    <[email protected]>
    >>>>>> wrote:
    >>>>>>>      I have a STM32 Nucleo-h753zi board - and configured a
    build for
    >>>>>>> nucleo-743zi2:nsh (which is closest board/chip; the
    stm32h753zi
    >>>>>> is
    >>>>>>>      same
    >>>>>>>      as stm32h743zi but h753zi includes crypto acceleration
    >> hardware).
    >>>>>>>      Build works, but if I boot and try 'time ls' nuttx
    faults:
    >>>>>>>
    >>>>>>>      nsh> uname -a
    >>>>>>>      NuttX 0.0.0 9ecfff0833 Feb  6 2026 15:45:28 arm
    nucleo-h743zi2
    >>>>>>>      nsh> time ls
    >>>>>>>      /:
    >>>>>>>        dev/
    >>>>>>>
    >>>>>>>      0.00dump_assert_info: Current Version: NuttX  0.0.0
    9ecfff0833
    >>>>>>>      Feb  6 2026 15:45:28 arm
    >>>>>>>      dump_assert_info: Assertion failed panic: at file: :0
    task:
    >>>>>>>      <noname> process: <noname> 0x800c9fd
    >>>>>>>      up_dump_register: R0: 0801e624 R1: 0000000a R2:
    00000050  R3:
    >>>>>> 0000000a
    >>>>>>>      up_dump_register: R4: 00000001 R5: 240000e4 R6:
    00000000  FP:
    >>>>>> 00000000
    >>>>>>>      up_dump_register: R8: 00000000 SB: 00000000 SL:
    00000000 R11:
    >>>>>> 00000000
    >>>>>>>      up_dump_register: IP: 00000000 SP: 38000c08 LR:
    080059db  PC:
    >>>>>> 08005984
    >>>>>>>      up_dump_register: xPSR: 41000000 BASEPRI: 00000000
    CONTROL:
    >>>>>> 00000000
    >>>>>>>      up_dump_register: EXC_RETURN: ffffffe9
    >>>>>>>      dump_stackinfo: User Stack:
    >>>>>>>      dump_stackinfo:   base: 0x38000518
    >>>>>>>      dump_stackinfo:   size: 00002000
    >>>>>>>      dump_stackinfo:     sp: 0x38000c08
    >>>>>>>      stack_dump: 0x38000be8: 00000000 00000000 00000000
    00000000
    >>>>>>>      00000000 00000000 00000000 00000000
    >>>>>>>      stack_dump: 0x38000c08: 0000000a 0801e624 0801e624
    38000200
    >>>>>>>      38000fac 00000000 0801e624 080172c1
    >>>>>>>      stack_dump: 0x38000c28: 00000000 0801e624 38000200
    38000158
    >>>>>>>      00000000 00000000 38000fac 0800caa1
    >>>>>>>      stack_dump: 0x38000c48: 00000000 0800cc77 0801e624
    000002fc
    >>>>>>>      38000500 00000001 00000001 38000cf0
    >>>>>>>      stack_dump: 0x38000c68: 38000cf0 00000008 38000200
    00000000
    >>>>>>>      00000000 0800ca79 38000500 00000001
    >>>>>>>      stack_dump: 0x38000c88: 00000064 38000cf0 00000064
    0800ca33
    >>>>>>>      38000500 00000001 00000064 00000000
    >>>>>>>      stack_dump: 0x38000ca8: 00000000 08009325 00000000
    38000500
    >>>>>>>      00000001 0800c9fd 00000000 080052f1
    >>>>>>>      stack_dump: 0x38000cc8: 00000000 38000500 00000000
    38000158
    >>>>>>>      00000001 00000001 00000000 00000000
    >>>>>>>      stack_dump: 0x38000ce8: 00000000 00000000 00000000
    00000000
    >>>>>>>      00000000 00000000 00000000 00000000
    >>>>>>>      dump_tasks:    PID GROUP PRI POLICY   TYPE    NPX
    STATE  EVENT
    >>>>>>>        SIGMASK STACKBASE  STACKSIZE   COMMAND
    >>>>>>>      dump_task:       0     0  0 FIFO     Kthread -   Ready
    >>>>>>>      0000000000000000 0x240018b0      1000   <noname>
    >>>>>>>      dump_task:       1     1 100 RR       Task    -   Running
    >>>>>>>      0000000000000000 0x38000518      2000   <noname> ��]���&
    >>>>>>>
    >>>>>>>      Wondering if anyone has run across this before? 
    Backtrace
    >> shows:
    >>>>>>>      Program received signal SIGTRAP, Trace/breakpoint trap.
    >>>>>>>      exception_common () at armv7-m/arm_exception.S:127
    >>>>>>>      127             mrs      r0, ipsr           /*
    >> R0=exception
    >>>>>>>      number */
    >>>>>>>      where
    >>>>>>>      #0  exception_common () at armv7-m/arm_exception.S:127
    >>>>>>>      #1  <signal handler called>
    >>>>>>>      #2  0x08005984 in env_cmpname (pszname=0x801e624 "PS1",
    >>>>>>>           peqname=0xa <error: Cannot access memory at
    address 0xa>)
    >>>>>>>           at environ/env_findvar.c:50
    >>>>>>>      #3  0x080059da in env_findvar (group=0x38000200,
    pname=0x801e624
    >>>>>>>      "PS1")
    >>>>>>>           at environ/env_findvar.c:105
    >>>>>>>      #4  0x080172c0 in getenv (name=0x801e624 "PS1") at
    >>>>>>>      environ/env_getenv.c:89
    >>>>>>>      #5  0x0800caa0 in nsh_update_prompt () at nsh_prompt.c:77
    >>>>>>>      #6  0x0800cc76 in nsh_session (pstate=0x38000cf0,
    login=1,
    >> argc=1,
    >>>>>>>           argv=0x38000500) at nsh_session.c:249
    >>>>>>>      #7  0x0800ca78 in nsh_consolemain (argc=1,
    argv=0x38000500)
    >>>>>>>           at nsh_consolemain.c:77
    >>>>>>>      #8  0x0800ca32 in nsh_main (argc=1, argv=0x38000500)
    at nsh_
    >>>>>> main.c:76
    >>>>>>>      #9  0x08009324 in nxtask_startup (entrypt=0x800c9fd
    <nsh_main>,
    >>>>>>>      argc=1,
    >>>>>>>           argv=0x38000500) at sched/task_startup.c:72
    >>>>>>>      #10 0x080052f0 in nxtask_start () at
    task/task_start.c:104
    >>>>>>>      #11 0x00000000 in ?? ()
    >>>>>>>
    >>>>>>>      Scratching the surface shows that env_findvar() is
    called with
    >>>>>> group
    >>>>>>>      pointer of 0x38000200, group->tg_envp is 0x380004b8,
    both which
    >>>>>> are
    >>>>>>>      reasonable. But *group->tg_envp is 0xA.  Further if I
    "watch
    >>>>>>>      *(int*)0x380004b8" in GDB, I see it is getting
    overwritten by
    >>>>>>>      up_serialout() invoked from stm32_serial.c::up_send.
    >>>>>>>
    >>>>>>>      Any suggestions on how I can best track this down
    further?
    >>>>>>>
    >>>>>>>      Thanks in advance!
    >>>>>>>
    >>>>>>>      --
    >>>>>>>      Peter Barada
    >>>>>>> [email protected]
    >>>>>>>
    >>>>>> --
    >>>>>> Peter Barada
    >>>>>> [email protected]
    >>>>>>
    >>>>> --
    >>>> Peter [email protected]
    >>>>
    >>>> --
    >>> Peter [email protected]
    >>>
    >>>
-- Peter Barada
    [email protected]

--
Peter Barada
[email protected]

Reply via email to