Re: STM32H7 crash

Peter Barada Sun, 08 Feb 2026 14:23:23 -0800

Nathan,

I believe you're misunderstanding me. I have two boards - nucleo-f446reand nuclo-h753zi. I created two separate builds from the same sourcehahses(nucleo-f446re:nsh and nucleo-h743zi2:nsh) using stock defconfigsand then ran those builds on the corresponding boards. Thenucleo-f446re:nsh build ran on the nuclo-f446re("time ls" passes), whilethe nucli-h743zi2:nsh build failed on the nucleo-h753zi board whiletesting "time ls" at boot on each.

Yes, the only difference between h743zi and h753zi chips is the h753zicontains the crypto accelerator.


On 2/8/26 16:57, Nathan Hartman wrote:

Hi Peter,

That is interesting (and strange) indeed. IIRC the only differencebetween those two chips is that the 753 has built-in cryptoaccelerators while the 743 does not. I believe that a firmware imagebuilt for one will work correctly on the other (provided obviouslythat the firmware does not attempt to access the crypto accelerators).


Did you make a separate build for each chip?

Or did you flash an *identical* image to both boards with stack size =2048 and the same image succeeded on the 753 and failed on the 743?

I'm asking because if it's an identical image, that would require aquite different debugging strategy than if it was a separate build foreach chip.


Thanks,
Nathan

On Sun, Feb 8, 2026 at 3:11 PM Peter Barada <[email protected]>wrote:


    Nathan,

    What's strange is that same master source (nuttx hash
    e83606732d5e71eb98a9eb544537dbbeb71aa58b, apps hash
    d48b45000d1d083082f7a1650f351573c36a87d0) with INIT_STACKSIZE=2048
    in the default .config fails on nucleo-h743zi2 but passes on
    nucleo-h743zi2(run on my nuclo-h753zi board) when I try "time
    ls".  I turned on all the stack checks just to be sure
    nuclo-f446re wasn't just "lucky".

    On 2/7/26 23:54, Nathan Hartman wrote:

    Yeah, it's usually the stack, but does anyone know why it needs
    to be enlarged now? Is something using more stack than before?

    On Sat, Feb 7, 2026 at 5:28 PM Peter Barada
    <[email protected]> wrote:

        Cranking up CONFIG_INIT_STACKSIZE to 3072 fixes the issue.

        I tried enabling STACK_COLORATION, STACK_USAGE, and
        ARMV7M_STACKTRACE
        while leaving INIT_STACKSIZE at 2048 to hopefully and debug
        using
        STM32CubeIDE when I try "time ls" the GDB session is lost
        (which seems
        strange).

        If I then enable ARMV7M_STACKCHECK_BREAKPOINT GDB stops when
        it detects
        the stack overflow can get a call stack to understand why but
        can't
        continue(to show dump).

        Finally after enabling ARCH_STACKDUMP, ARMV7M_STACKCHECK,
        SCHED_BACKTRACE, STACK_COLORATION, STACK_USAGE, disable
        STACKCHECK_BREAKPOINT, and enable/set
        ARCH_INTERRUPTSTACK=2048, and
        ARCH_STACKDUMP_MAX_LENGTH=1024, I get a full dump when it
        detects stack
        overflow.

        Thanks for the help!


        On 2/7/26 03:25, raiden00pl wrote:
        > hi, this is a 100% stack issue. Increase all stack sizes to
        at least 4092.
        > Another option is to enable full optimisation with
        CONFIG_DEBUG_FULLOPT=y,
        > should also help.
        >
        > quick tip: about 80% of crashes in NuttX are stack issues,
        the first thing
        > you
        > always do when such crashes occur is to increase all stack
        sizes :)
        >
        > sob., 7 lut 2026 o 04:02 Matteo Golin
        <[email protected]> napisał(a):
        >
        >> I am not familiar enough, but there should be an option
        for stack canaries.
        >> I haven't had much luck with that configuration, and I
        imagine that your
        >> DEBUGASSERT will trigger before stack smashing is detected.
        >>
        >> Matteo
        >>
        >> On Fri, Feb 6, 2026, 8:45 PM Peter Barada
        <[email protected]> wrote:
        >>
        >>> Haven't tried yet(personally feel should know _why_ it
        happens) - is
        >> there
        >>> a config for compiling in stack checking on function entry?
        >>> On 2/6/26 20:22, Matteo Golin wrote:
        >>>
        >>> Hmmm, if the problem goes that far back it may not be
        worth triaging that
        >>> way. Things have probably diverged so much since then. No
        luck with the
        >>> stack increase?
        >>>
        >>> Matteo
        >>>
        >>> On Fri, Feb 6, 2026, 8:18 PM Peter Barada
        <[email protected]>
        >> wrote:
        >>>> Matteo,
        >>>>
        >>>> I'm walking back release points and have had to change board
        >>>> configuration names(to nucleo-h743zi), rename nuttx-apps
        to appa, and
        >> still
        >>>> seeing the fault in release/11.0 branch.
        >>>>
        >>>> I'm trying to go back further but wondering if I'll find
        a bisect start
        >>>> point...
        >>>> On 2/6/26 17:05, Matteo Golin wrote:
        >>>>
        >>>> Hi Peter,
        >>>>
        >>>> My approach is kind of a headache since bisecting over
        an area where
        >> apps
        >>>> and NuttX are not always in sync is a major limitation
        of the split
        >> repo.
        >>>> My approach is usually:
        >>>>
        >>>> - Start the bisect in kernel
        >>>> - Check the commit date of the current HEAD
        >>>> - Check out to a commit of the same/similar date in apps
        >>>> - Build
        >>>> - Mentally note if this commit was good or bad based on
        the results of
        >>>> running the image
        >>>> - make distclean (avoids artifacts carrying over between
        bisections and
        >>>> breaking everything)
        >>>> - Mark commit good or bad with git bisect
        >>>>
        >>>> Then basically repeat this until bisecting is finished.
        It sucks and I
        >>>> did suggest a script in /tools/ to try and automate most
        of this, but I
        >>>> never got around to writing it.
        >>>>
        >>>> I would suggest you start by checking for the issue on a
        stable release
        >>>> (i.e. 12.12.0) to see if that's a good commit you can
        start from.
        >> Usually
        >>>> those releases have a higher degree of testing because
        everyone who
        >> voted
        >>>> for the release ran some images on their hardware.
        >>>>
        >>>> That's honestly a lot of work but you never know if
        it'll end up being
        >>>> faster than trying to triage with logs!
        >>>>
        >>>> Matteo
        >>>>
        >>>> On Fri, Feb 6, 2026, 4:50 PM Nathan Hartman
        <[email protected]>
        >>>> wrote:
        >>>>
        >>>>> First place I would look: is the stack overflowing?
        (You could try
        >>>>> enabling some of the stack debugging features.)
        >>>>>
        >>>>> On Fri, Feb 6, 2026 at 4:34 PM Peter Barada
        <[email protected]>
        >>>>> wrote:
        >>>>>
        >>>>>> Matteo,
        >>>>>>
        >>>>>> I don't know if this was working before but if you can
        suggest a good
        >>>>>> starting point I can cycle through git bisect to
        narrow down to the
        >>>>>> failing commit.  What's the best approach to using git
        bisect across
        >>>>>> multiple repos (since changes in nuttx may have
        necessary changes in
        >>>>>> nuttx-apps and need to keep them in sync at each build
        point)?
        >>>>>>
        >>>>>> As an aside, I also I have a nucleo-f446re board 'time
        ls' works fine
        >>>>>> there.
        >>>>>>
        >>>>>> Further, does anyone have GDB scripts that make it
        easier to decipher
        >>>>>> Nuttx structures from memory (e.g. dump task/semaphore
        lists, etc)?
        >>>>>> I've
        >>>>>> started cobbling snippets but figure I'd ask before
        reinventing the
        >>>>>> wheel.
        >>>>>>
        >>>>>>
        >>>>>> On 2/6/26 16:12, Matteo Golin wrote:
        >>>>>>> Hi Peter,
        >>>>>>>
        >>>>>>> If you happen to know that this was working before on
        an older NuttX
        >>>>>>> version, you could use git bisect to narrow down the
        breaking
        >> commit.
        >>>>>>> Then the issue might be clearer.
        >>>>>>>
        >>>>>>> Best,
        >>>>>>> Matteo
        >>>>>>>
        >>>>>>> On Fri, Feb 6, 2026, 4:09 PM Peter Barada
        <[email protected]>
        >>>>>> wrote:
        >>>>>>>      I have a STM32 Nucleo-h753zi board - and
        configured a build for
        >>>>>>> nucleo-743zi2:nsh (which is closest board/chip; the
        stm32h753zi
        >>>>>> is
        >>>>>>>      same
        >>>>>>>      as stm32h743zi but h753zi includes crypto
        acceleration
        >> hardware).
        >>>>>>>      Build works, but if I boot and try 'time ls'
        nuttx faults:
        >>>>>>>
        >>>>>>>      nsh> uname -a
        >>>>>>>      NuttX 0.0.0 9ecfff0833 Feb  6 2026 15:45:28 arm
        nucleo-h743zi2
        >>>>>>>      nsh> time ls
        >>>>>>>      /:
        >>>>>>>        dev/
        >>>>>>>
        >>>>>>> 0.00dump_assert_info: Current Version: NuttX 0.0.0
        9ecfff0833
        >>>>>>>      Feb  6 2026 15:45:28 arm
        >>>>>>> dump_assert_info: Assertion failed panic: at file: :0
        task:
        >>>>>>>      <noname> process: <noname> 0x800c9fd
        >>>>>>> up_dump_register: R0: 0801e624 R1: 0000000a R2:
        00000050  R3:
        >>>>>> 0000000a
        >>>>>>> up_dump_register: R4: 00000001 R5: 240000e4 R6:
        00000000  FP:
        >>>>>> 00000000
        >>>>>>> up_dump_register: R8: 00000000 SB: 00000000 SL:
        00000000 R11:
        >>>>>> 00000000
        >>>>>>> up_dump_register: IP: 00000000 SP: 38000c08 LR:
        080059db  PC:
        >>>>>> 08005984
        >>>>>>> up_dump_register: xPSR: 41000000 BASEPRI: 00000000
        CONTROL:
        >>>>>> 00000000
        >>>>>>> up_dump_register: EXC_RETURN: ffffffe9
        >>>>>>>      dump_stackinfo: User Stack:
        >>>>>>>      dump_stackinfo:  base: 0x38000518
        >>>>>>>      dump_stackinfo:  size: 00002000
        >>>>>>>      dump_stackinfo:    sp: 0x38000c08
        >>>>>>>      stack_dump: 0x38000be8: 00000000 00000000
        00000000 00000000
        >>>>>>>      00000000 00000000 00000000 00000000
        >>>>>>>      stack_dump: 0x38000c08: 0000000a 0801e624
        0801e624 38000200
        >>>>>>>      38000fac 00000000 0801e624 080172c1
        >>>>>>>      stack_dump: 0x38000c28: 00000000 0801e624
        38000200 38000158
        >>>>>>>      00000000 00000000 38000fac 0800caa1
        >>>>>>>      stack_dump: 0x38000c48: 00000000 0800cc77
        0801e624 000002fc
        >>>>>>>      38000500 00000001 00000001 38000cf0
        >>>>>>>      stack_dump: 0x38000c68: 38000cf0 00000008
        38000200 00000000
        >>>>>>>      00000000 0800ca79 38000500 00000001
        >>>>>>>      stack_dump: 0x38000c88: 00000064 38000cf0
        00000064 0800ca33
        >>>>>>>      38000500 00000001 00000064 00000000
        >>>>>>>      stack_dump: 0x38000ca8: 00000000 08009325
        00000000 38000500
        >>>>>>>      00000001 0800c9fd 00000000 080052f1
        >>>>>>>      stack_dump: 0x38000cc8: 00000000 38000500
        00000000 38000158
        >>>>>>>      00000001 00000001 00000000 00000000
        >>>>>>>      stack_dump: 0x38000ce8: 00000000 00000000
        00000000 00000000
        >>>>>>>      00000000 00000000 00000000 00000000
        >>>>>>>      dump_tasks: PID GROUP PRI POLICY   TYPE    NPX
        STATE  EVENT
        >>>>>>>        SIGMASK   STACKBASE  STACKSIZE   COMMAND
        >>>>>>>      dump_task:  0     0   0 FIFO     Kthread -   Ready
        >>>>>>>      0000000000000000 0x240018b0      1000   <noname>
        >>>>>>>      dump_task:  1     1 100 RR       Task    -   Running
        >>>>>>>      0000000000000000 0x38000518      2000   <noname>
        ��]���&
        >>>>>>>
        >>>>>>>      Wondering if anyone has run across this before? 
        Backtrace
        >> shows:
        >>>>>>>      Program received signal SIGTRAP,
        Trace/breakpoint trap.
        >>>>>>>      exception_common () at armv7-m/arm_exception.S:127
        >>>>>>>      127  mrs             r0, ipsr           /*
        >> R0=exception
        >>>>>>>      number */
        >>>>>>>      where
        >>>>>>>      #0 exception_common () at
        armv7-m/arm_exception.S:127
        >>>>>>>      #1  <signal handler called>
        >>>>>>>      #2  0x08005984 in env_cmpname (pszname=0x801e624
        "PS1",
        >>>>>>>           peqname=0xa <error: Cannot access memory at
        address 0xa>)
        >>>>>>>           at environ/env_findvar.c:50
        >>>>>>>      #3  0x080059da in env_findvar (group=0x38000200,
        pname=0x801e624
        >>>>>>>      "PS1")
        >>>>>>>           at environ/env_findvar.c:105
        >>>>>>>      #4  0x080172c0 in getenv (name=0x801e624 "PS1") at
        >>>>>>>      environ/env_getenv.c:89
        >>>>>>>      #5  0x0800caa0 in nsh_update_prompt () at
        nsh_prompt.c:77
        >>>>>>>      #6  0x0800cc76 in nsh_session
        (pstate=0x38000cf0, login=1,
        >> argc=1,
        >>>>>>>  argv=0x38000500) at nsh_session.c:249
        >>>>>>>      #7  0x0800ca78 in nsh_consolemain (argc=1,
        argv=0x38000500)
        >>>>>>>           at nsh_consolemain.c:77
        >>>>>>>      #8  0x0800ca32 in nsh_main (argc=1,
        argv=0x38000500) at nsh_
        >>>>>> main.c:76
        >>>>>>>      #9  0x08009324 in nxtask_startup
        (entrypt=0x800c9fd <nsh_main>,
        >>>>>>>      argc=1,
        >>>>>>>  argv=0x38000500) at sched/task_startup.c:72
        >>>>>>>      #10 0x080052f0 in nxtask_start () at
        task/task_start.c:104
        >>>>>>>      #11 0x00000000 in ?? ()
        >>>>>>>
        >>>>>>>      Scratching the surface shows that env_findvar()
        is called with
        >>>>>> group
        >>>>>>>      pointer of 0x38000200, group->tg_envp is
        0x380004b8, both which
        >>>>>> are
        >>>>>>>      reasonable. But *group->tg_envp is 0xA.  Further
        if I "watch
        >>>>>>> *(int*)0x380004b8" in GDB, I see it is getting
        overwritten by
        >>>>>>>      up_serialout() invoked from stm32_serial.c::up_send.
        >>>>>>>
        >>>>>>>      Any suggestions on how I can best track this
        down further?
        >>>>>>>
        >>>>>>>      Thanks in advance!
        >>>>>>>
        >>>>>>>      --
        >>>>>>>      Peter Barada
        >>>>>>> [email protected]
        >>>>>>>
        >>>>>> --
        >>>>>> Peter Barada
        >>>>>> [email protected]
        >>>>>>
        >>>>> --
        >>>> Peter [email protected]
        >>>>
        >>>> --
        >>> Peter [email protected]
        >>>
        >>>

--Peter Barada

        [email protected]

--Peter Barada

    [email protected]

--
Peter Barada
[email protected]

Re: STM32H7 crash

Reply via email to