Nathan,

I believe you're misunderstanding me.  I have two boards - nucleo-f446re and nuclo-h753zi.  I created two separate builds from the same source hahses(nucleo-f446re:nsh and nucleo-h743zi2:nsh) using stock defconfigs and then ran those builds on the corresponding boards.  The nucleo-f446re:nsh build ran on the nuclo-f446re("time ls" passes), while the nucli-h743zi2:nsh build failed on the nucleo-h753zi board while testing "time ls" at boot on each.

Yes, the only difference between h743zi and h753zi chips is the h753zi contains the crypto accelerator.

On 2/8/26 16:57, Nathan Hartman wrote:
Hi Peter,

That is interesting (and strange) indeed. IIRC the only difference between those two chips is that the 753 has built-in crypto accelerators while the 743 does not. I believe that a firmware image built for one will work correctly on the other (provided obviously that the firmware does not attempt to access the crypto accelerators).

Did you make a separate build for each chip?

Or did you flash an *identical* image to both boards with stack size = 2048 and the same image succeeded on the 753 and failed on the 743?

I'm asking because if it's an identical image, that would require a quite different debugging strategy than if it was a separate build for each chip.

Thanks,
Nathan

On Sun, Feb 8, 2026 at 3:11 PM Peter Barada <[email protected]> wrote:

    Nathan,

    What's strange is that same master source (nuttx hash
    e83606732d5e71eb98a9eb544537dbbeb71aa58b, apps hash
    d48b45000d1d083082f7a1650f351573c36a87d0) with INIT_STACKSIZE=2048
    in the default .config fails on nucleo-h743zi2 but passes on
    nucleo-h743zi2(run on my nuclo-h753zi board) when I try "time
    ls".  I turned on all the stack checks just to be sure
    nuclo-f446re wasn't just "lucky".

    On 2/7/26 23:54, Nathan Hartman wrote:
    Yeah, it's usually the stack, but does anyone know why it needs
    to be enlarged now? Is something using more stack than before?

    On Sat, Feb 7, 2026 at 5:28 PM Peter Barada
    <[email protected]> wrote:

        Cranking up CONFIG_INIT_STACKSIZE to 3072 fixes the issue.

        I tried enabling STACK_COLORATION, STACK_USAGE, and
        ARMV7M_STACKTRACE
        while leaving INIT_STACKSIZE at 2048 to hopefully and debug
        using
        STM32CubeIDE when I try "time ls" the GDB session is lost
        (which seems
        strange).

        If I then enable ARMV7M_STACKCHECK_BREAKPOINT GDB stops when
        it detects
        the stack overflow can get a call stack to understand why but
        can't
        continue(to show dump).

        Finally after enabling ARCH_STACKDUMP, ARMV7M_STACKCHECK,
        SCHED_BACKTRACE, STACK_COLORATION, STACK_USAGE, disable
        STACKCHECK_BREAKPOINT, and enable/set
        ARCH_INTERRUPTSTACK=2048, and
        ARCH_STACKDUMP_MAX_LENGTH=1024, I get a full dump when it
        detects stack
        overflow.

        Thanks for the help!


        On 2/7/26 03:25, raiden00pl wrote:
        > hi, this is a 100% stack issue. Increase all stack sizes to
        at least 4092.
        > Another option is to enable full optimisation with
        CONFIG_DEBUG_FULLOPT=y,
        > should also help.
        >
        > quick tip: about 80% of crashes in NuttX are stack issues,
        the first thing
        > you
        > always do when such crashes occur is to increase all stack
        sizes :)
        >
        > sob., 7 lut 2026 o 04:02 Matteo Golin
        <[email protected]> napisał(a):
        >
        >> I am not familiar enough, but there should be an option
        for stack canaries.
        >> I haven't had much luck with that configuration, and I
        imagine that your
        >> DEBUGASSERT will trigger before stack smashing is detected.
        >>
        >> Matteo
        >>
        >> On Fri, Feb 6, 2026, 8:45 PM Peter Barada
        <[email protected]> wrote:
        >>
        >>> Haven't tried yet(personally feel should know _why_ it
        happens) - is
        >> there
        >>> a config for compiling in stack checking on function entry?
        >>> On 2/6/26 20:22, Matteo Golin wrote:
        >>>
        >>> Hmmm, if the problem goes that far back it may not be
        worth triaging that
        >>> way. Things have probably diverged so much since then. No
        luck with the
        >>> stack increase?
        >>>
        >>> Matteo
        >>>
        >>> On Fri, Feb 6, 2026, 8:18 PM Peter Barada
        <[email protected]>
        >> wrote:
        >>>> Matteo,
        >>>>
        >>>> I'm walking back release points and have had to change board
        >>>> configuration names(to nucleo-h743zi), rename nuttx-apps
        to appa, and
        >> still
        >>>> seeing the fault in release/11.0 branch.
        >>>>
        >>>> I'm trying to go back further but wondering if I'll find
        a bisect start
        >>>> point...
        >>>> On 2/6/26 17:05, Matteo Golin wrote:
        >>>>
        >>>> Hi Peter,
        >>>>
        >>>> My approach is kind of a headache since bisecting over
        an area where
        >> apps
        >>>> and NuttX are not always in sync is a major limitation
        of the split
        >> repo.
        >>>> My approach is usually:
        >>>>
        >>>> - Start the bisect in kernel
        >>>> - Check the commit date of the current HEAD
        >>>> - Check out to a commit of the same/similar date in apps
        >>>> - Build
        >>>> - Mentally note if this commit was good or bad based on
        the results of
        >>>> running the image
        >>>> - make distclean (avoids artifacts carrying over between
        bisections and
        >>>> breaking everything)
        >>>> - Mark commit good or bad with git bisect
        >>>>
        >>>> Then basically repeat this until bisecting is finished.
        It sucks and I
        >>>> did suggest a script in /tools/ to try and automate most
        of this, but I
        >>>> never got around to writing it.
        >>>>
        >>>> I would suggest you start by checking for the issue on a
        stable release
        >>>> (i.e. 12.12.0) to see if that's a good commit you can
        start from.
        >> Usually
        >>>> those releases have a higher degree of testing because
        everyone who
        >> voted
        >>>> for the release ran some images on their hardware.
        >>>>
        >>>> That's honestly a lot of work but you never know if
        it'll end up being
        >>>> faster than trying to triage with logs!
        >>>>
        >>>> Matteo
        >>>>
        >>>> On Fri, Feb 6, 2026, 4:50 PM Nathan Hartman
        <[email protected]>
        >>>> wrote:
        >>>>
        >>>>> First place I would look: is the stack overflowing?
        (You could try
        >>>>> enabling some of the stack debugging features.)
        >>>>>
        >>>>> On Fri, Feb 6, 2026 at 4:34 PM Peter Barada
        <[email protected]>
        >>>>> wrote:
        >>>>>
        >>>>>> Matteo,
        >>>>>>
        >>>>>> I don't know if this was working before but if you can
        suggest a good
        >>>>>> starting point I can cycle through git bisect to
        narrow down to the
        >>>>>> failing commit.  What's the best approach to using git
        bisect across
        >>>>>> multiple repos (since changes in nuttx may have
        necessary changes in
        >>>>>> nuttx-apps and need to keep them in sync at each build
        point)?
        >>>>>>
        >>>>>> As an aside, I also I have a nucleo-f446re board 'time
        ls' works fine
        >>>>>> there.
        >>>>>>
        >>>>>> Further, does anyone have GDB scripts that make it
        easier to decipher
        >>>>>> Nuttx structures from memory (e.g. dump task/semaphore
        lists, etc)?
        >>>>>> I've
        >>>>>> started cobbling snippets but figure I'd ask before
        reinventing the
        >>>>>> wheel.
        >>>>>>
        >>>>>>
        >>>>>> On 2/6/26 16:12, Matteo Golin wrote:
        >>>>>>> Hi Peter,
        >>>>>>>
        >>>>>>> If you happen to know that this was working before on
        an older NuttX
        >>>>>>> version, you could use git bisect to narrow down the
        breaking
        >> commit.
        >>>>>>> Then the issue might be clearer.
        >>>>>>>
        >>>>>>> Best,
        >>>>>>> Matteo
        >>>>>>>
        >>>>>>> On Fri, Feb 6, 2026, 4:09 PM Peter Barada
        <[email protected]>
        >>>>>> wrote:
        >>>>>>>      I have a STM32 Nucleo-h753zi board - and
        configured a build for
        >>>>>>> nucleo-743zi2:nsh (which is closest board/chip; the
        stm32h753zi
        >>>>>> is
        >>>>>>>      same
        >>>>>>>      as stm32h743zi but h753zi includes crypto
        acceleration
        >> hardware).
        >>>>>>>      Build works, but if I boot and try 'time ls'
        nuttx faults:
        >>>>>>>
        >>>>>>>      nsh> uname -a
        >>>>>>>      NuttX 0.0.0 9ecfff0833 Feb  6 2026 15:45:28 arm
        nucleo-h743zi2
        >>>>>>>      nsh> time ls
        >>>>>>>      /:
        >>>>>>>        dev/
        >>>>>>>
        >>>>>>> 0.00dump_assert_info: Current Version: NuttX 0.0.0
        9ecfff0833
        >>>>>>>      Feb  6 2026 15:45:28 arm
        >>>>>>> dump_assert_info: Assertion failed panic: at file: :0
        task:
        >>>>>>>      <noname> process: <noname> 0x800c9fd
        >>>>>>> up_dump_register: R0: 0801e624 R1: 0000000a R2:
        00000050  R3:
        >>>>>> 0000000a
        >>>>>>> up_dump_register: R4: 00000001 R5: 240000e4 R6:
        00000000  FP:
        >>>>>> 00000000
        >>>>>>> up_dump_register: R8: 00000000 SB: 00000000 SL:
        00000000 R11:
        >>>>>> 00000000
        >>>>>>> up_dump_register: IP: 00000000 SP: 38000c08 LR:
        080059db  PC:
        >>>>>> 08005984
        >>>>>>> up_dump_register: xPSR: 41000000 BASEPRI: 00000000
        CONTROL:
        >>>>>> 00000000
        >>>>>>> up_dump_register: EXC_RETURN: ffffffe9
        >>>>>>>      dump_stackinfo: User Stack:
        >>>>>>>      dump_stackinfo:  base: 0x38000518
        >>>>>>>      dump_stackinfo:  size: 00002000
        >>>>>>>      dump_stackinfo:    sp: 0x38000c08
        >>>>>>>      stack_dump: 0x38000be8: 00000000 00000000
        00000000 00000000
        >>>>>>>      00000000 00000000 00000000 00000000
        >>>>>>>      stack_dump: 0x38000c08: 0000000a 0801e624
        0801e624 38000200
        >>>>>>>      38000fac 00000000 0801e624 080172c1
        >>>>>>>      stack_dump: 0x38000c28: 00000000 0801e624
        38000200 38000158
        >>>>>>>      00000000 00000000 38000fac 0800caa1
        >>>>>>>      stack_dump: 0x38000c48: 00000000 0800cc77
        0801e624 000002fc
        >>>>>>>      38000500 00000001 00000001 38000cf0
        >>>>>>>      stack_dump: 0x38000c68: 38000cf0 00000008
        38000200 00000000
        >>>>>>>      00000000 0800ca79 38000500 00000001
        >>>>>>>      stack_dump: 0x38000c88: 00000064 38000cf0
        00000064 0800ca33
        >>>>>>>      38000500 00000001 00000064 00000000
        >>>>>>>      stack_dump: 0x38000ca8: 00000000 08009325
        00000000 38000500
        >>>>>>>      00000001 0800c9fd 00000000 080052f1
        >>>>>>>      stack_dump: 0x38000cc8: 00000000 38000500
        00000000 38000158
        >>>>>>>      00000001 00000001 00000000 00000000
        >>>>>>>      stack_dump: 0x38000ce8: 00000000 00000000
        00000000 00000000
        >>>>>>>      00000000 00000000 00000000 00000000
        >>>>>>>      dump_tasks: PID GROUP PRI POLICY   TYPE    NPX
        STATE  EVENT
        >>>>>>>        SIGMASK   STACKBASE  STACKSIZE   COMMAND
        >>>>>>>      dump_task:  0     0   0 FIFO     Kthread -   Ready
        >>>>>>>      0000000000000000 0x240018b0      1000   <noname>
        >>>>>>>      dump_task:  1     1 100 RR       Task    -   Running
        >>>>>>>      0000000000000000 0x38000518      2000   <noname>
        ��]���&
        >>>>>>>
        >>>>>>>      Wondering if anyone has run across this before? 
        Backtrace
        >> shows:
        >>>>>>>      Program received signal SIGTRAP,
        Trace/breakpoint trap.
        >>>>>>>      exception_common () at armv7-m/arm_exception.S:127
        >>>>>>>      127  mrs             r0, ipsr           /*
        >> R0=exception
        >>>>>>>      number */
        >>>>>>>      where
        >>>>>>>      #0 exception_common () at
        armv7-m/arm_exception.S:127
        >>>>>>>      #1  <signal handler called>
        >>>>>>>      #2  0x08005984 in env_cmpname (pszname=0x801e624
        "PS1",
        >>>>>>>           peqname=0xa <error: Cannot access memory at
        address 0xa>)
        >>>>>>>           at environ/env_findvar.c:50
        >>>>>>>      #3  0x080059da in env_findvar (group=0x38000200,
        pname=0x801e624
        >>>>>>>      "PS1")
        >>>>>>>           at environ/env_findvar.c:105
        >>>>>>>      #4  0x080172c0 in getenv (name=0x801e624 "PS1") at
        >>>>>>>      environ/env_getenv.c:89
        >>>>>>>      #5  0x0800caa0 in nsh_update_prompt () at
        nsh_prompt.c:77
        >>>>>>>      #6  0x0800cc76 in nsh_session
        (pstate=0x38000cf0, login=1,
        >> argc=1,
        >>>>>>>  argv=0x38000500) at nsh_session.c:249
        >>>>>>>      #7  0x0800ca78 in nsh_consolemain (argc=1,
        argv=0x38000500)
        >>>>>>>           at nsh_consolemain.c:77
        >>>>>>>      #8  0x0800ca32 in nsh_main (argc=1,
        argv=0x38000500) at nsh_
        >>>>>> main.c:76
        >>>>>>>      #9  0x08009324 in nxtask_startup
        (entrypt=0x800c9fd <nsh_main>,
        >>>>>>>      argc=1,
        >>>>>>>  argv=0x38000500) at sched/task_startup.c:72
        >>>>>>>      #10 0x080052f0 in nxtask_start () at
        task/task_start.c:104
        >>>>>>>      #11 0x00000000 in ?? ()
        >>>>>>>
        >>>>>>>      Scratching the surface shows that env_findvar()
        is called with
        >>>>>> group
        >>>>>>>      pointer of 0x38000200, group->tg_envp is
        0x380004b8, both which
        >>>>>> are
        >>>>>>>      reasonable. But *group->tg_envp is 0xA.  Further
        if I "watch
        >>>>>>> *(int*)0x380004b8" in GDB, I see it is getting
        overwritten by
        >>>>>>>      up_serialout() invoked from stm32_serial.c::up_send.
        >>>>>>>
        >>>>>>>      Any suggestions on how I can best track this
        down further?
        >>>>>>>
        >>>>>>>      Thanks in advance!
        >>>>>>>
        >>>>>>>      --
        >>>>>>>      Peter Barada
        >>>>>>> [email protected]
        >>>>>>>
        >>>>>> --
        >>>>>> Peter Barada
        >>>>>> [email protected]
        >>>>>>
        >>>>> --
        >>>> Peter [email protected]
        >>>>
        >>>> --
        >>> Peter [email protected]
        >>>
        >>>
-- Peter Barada
        [email protected]

-- Peter Barada
    [email protected]

--
Peter Barada
[email protected]

Reply via email to