Yeah, it's usually the stack, but does anyone know why it needs to be
enlarged now? Is something using more stack than before?
On Sat, Feb 7, 2026 at 5:28 PM Peter Barada <[email protected]>
wrote:
Cranking up CONFIG_INIT_STACKSIZE to 3072 fixes the issue.
I tried enabling STACK_COLORATION, STACK_USAGE, and ARMV7M_STACKTRACE
while leaving INIT_STACKSIZE at 2048 to hopefully and debug using
STM32CubeIDE when I try "time ls" the GDB session is lost (which
seems
strange).
If I then enable ARMV7M_STACKCHECK_BREAKPOINT GDB stops when it
detects
the stack overflow can get a call stack to understand why but can't
continue(to show dump).
Finally after enabling ARCH_STACKDUMP, ARMV7M_STACKCHECK,
SCHED_BACKTRACE, STACK_COLORATION, STACK_USAGE, disable
STACKCHECK_BREAKPOINT, and enable/set ARCH_INTERRUPTSTACK=2048, and
ARCH_STACKDUMP_MAX_LENGTH=1024, I get a full dump when it detects
stack
overflow.
Thanks for the help!
On 2/7/26 03:25, raiden00pl wrote:
> hi, this is a 100% stack issue. Increase all stack sizes to at
least 4092.
> Another option is to enable full optimisation with
CONFIG_DEBUG_FULLOPT=y,
> should also help.
>
> quick tip: about 80% of crashes in NuttX are stack issues, the
first thing
> you
> always do when such crashes occur is to increase all stack sizes :)
>
> sob., 7 lut 2026 o 04:02 Matteo Golin <[email protected]>
napisał(a):
>
>> I am not familiar enough, but there should be an option for
stack canaries.
>> I haven't had much luck with that configuration, and I imagine
that your
>> DEBUGASSERT will trigger before stack smashing is detected.
>>
>> Matteo
>>
>> On Fri, Feb 6, 2026, 8:45 PM Peter Barada
<[email protected]> wrote:
>>
>>> Haven't tried yet(personally feel should know _why_ it
happens) - is
>> there
>>> a config for compiling in stack checking on function entry?
>>> On 2/6/26 20:22, Matteo Golin wrote:
>>>
>>> Hmmm, if the problem goes that far back it may not be worth
triaging that
>>> way. Things have probably diverged so much since then. No luck
with the
>>> stack increase?
>>>
>>> Matteo
>>>
>>> On Fri, Feb 6, 2026, 8:18 PM Peter Barada <[email protected]>
>> wrote:
>>>> Matteo,
>>>>
>>>> I'm walking back release points and have had to change board
>>>> configuration names(to nucleo-h743zi), rename nuttx-apps to
appa, and
>> still
>>>> seeing the fault in release/11.0 branch.
>>>>
>>>> I'm trying to go back further but wondering if I'll find a
bisect start
>>>> point...
>>>> On 2/6/26 17:05, Matteo Golin wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> My approach is kind of a headache since bisecting over an
area where
>> apps
>>>> and NuttX are not always in sync is a major limitation of the
split
>> repo.
>>>> My approach is usually:
>>>>
>>>> - Start the bisect in kernel
>>>> - Check the commit date of the current HEAD
>>>> - Check out to a commit of the same/similar date in apps
>>>> - Build
>>>> - Mentally note if this commit was good or bad based on the
results of
>>>> running the image
>>>> - make distclean (avoids artifacts carrying over between
bisections and
>>>> breaking everything)
>>>> - Mark commit good or bad with git bisect
>>>>
>>>> Then basically repeat this until bisecting is finished. It
sucks and I
>>>> did suggest a script in /tools/ to try and automate most of
this, but I
>>>> never got around to writing it.
>>>>
>>>> I would suggest you start by checking for the issue on a
stable release
>>>> (i.e. 12.12.0) to see if that's a good commit you can start from.
>> Usually
>>>> those releases have a higher degree of testing because
everyone who
>> voted
>>>> for the release ran some images on their hardware.
>>>>
>>>> That's honestly a lot of work but you never know if it'll end
up being
>>>> faster than trying to triage with logs!
>>>>
>>>> Matteo
>>>>
>>>> On Fri, Feb 6, 2026, 4:50 PM Nathan Hartman
<[email protected]>
>>>> wrote:
>>>>
>>>>> First place I would look: is the stack overflowing? (You
could try
>>>>> enabling some of the stack debugging features.)
>>>>>
>>>>> On Fri, Feb 6, 2026 at 4:34 PM Peter Barada
<[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Matteo,
>>>>>>
>>>>>> I don't know if this was working before but if you can
suggest a good
>>>>>> starting point I can cycle through git bisect to narrow
down to the
>>>>>> failing commit. What's the best approach to using git
bisect across
>>>>>> multiple repos (since changes in nuttx may have necessary
changes in
>>>>>> nuttx-apps and need to keep them in sync at each build point)?
>>>>>>
>>>>>> As an aside, I also I have a nucleo-f446re board 'time ls'
works fine
>>>>>> there.
>>>>>>
>>>>>> Further, does anyone have GDB scripts that make it easier
to decipher
>>>>>> Nuttx structures from memory (e.g. dump task/semaphore
lists, etc)?
>>>>>> I've
>>>>>> started cobbling snippets but figure I'd ask before
reinventing the
>>>>>> wheel.
>>>>>>
>>>>>>
>>>>>> On 2/6/26 16:12, Matteo Golin wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> If you happen to know that this was working before on an
older NuttX
>>>>>>> version, you could use git bisect to narrow down the breaking
>> commit.
>>>>>>> Then the issue might be clearer.
>>>>>>>
>>>>>>> Best,
>>>>>>> Matteo
>>>>>>>
>>>>>>> On Fri, Feb 6, 2026, 4:09 PM Peter Barada
<[email protected]>
>>>>>> wrote:
>>>>>>> I have a STM32 Nucleo-h753zi board - and configured a
build for
>>>>>>> nucleo-743zi2:nsh (which is closest board/chip; the
stm32h753zi
>>>>>> is
>>>>>>> same
>>>>>>> as stm32h743zi but h753zi includes crypto acceleration
>> hardware).
>>>>>>> Build works, but if I boot and try 'time ls' nuttx
faults:
>>>>>>>
>>>>>>> nsh> uname -a
>>>>>>> NuttX 0.0.0 9ecfff0833 Feb 6 2026 15:45:28 arm
nucleo-h743zi2
>>>>>>> nsh> time ls
>>>>>>> /:
>>>>>>> dev/
>>>>>>>
>>>>>>> 0.00dump_assert_info: Current Version: NuttX 0.0.0
9ecfff0833
>>>>>>> Feb 6 2026 15:45:28 arm
>>>>>>> dump_assert_info: Assertion failed panic: at file: :0
task:
>>>>>>> <noname> process: <noname> 0x800c9fd
>>>>>>> up_dump_register: R0: 0801e624 R1: 0000000a R2:
00000050 R3:
>>>>>> 0000000a
>>>>>>> up_dump_register: R4: 00000001 R5: 240000e4 R6:
00000000 FP:
>>>>>> 00000000
>>>>>>> up_dump_register: R8: 00000000 SB: 00000000 SL:
00000000 R11:
>>>>>> 00000000
>>>>>>> up_dump_register: IP: 00000000 SP: 38000c08 LR:
080059db PC:
>>>>>> 08005984
>>>>>>> up_dump_register: xPSR: 41000000 BASEPRI: 00000000
CONTROL:
>>>>>> 00000000
>>>>>>> up_dump_register: EXC_RETURN: ffffffe9
>>>>>>> dump_stackinfo: User Stack:
>>>>>>> dump_stackinfo: base: 0x38000518
>>>>>>> dump_stackinfo: size: 00002000
>>>>>>> dump_stackinfo: sp: 0x38000c08
>>>>>>> stack_dump: 0x38000be8: 00000000 00000000 00000000
00000000
>>>>>>> 00000000 00000000 00000000 00000000
>>>>>>> stack_dump: 0x38000c08: 0000000a 0801e624 0801e624
38000200
>>>>>>> 38000fac 00000000 0801e624 080172c1
>>>>>>> stack_dump: 0x38000c28: 00000000 0801e624 38000200
38000158
>>>>>>> 00000000 00000000 38000fac 0800caa1
>>>>>>> stack_dump: 0x38000c48: 00000000 0800cc77 0801e624
000002fc
>>>>>>> 38000500 00000001 00000001 38000cf0
>>>>>>> stack_dump: 0x38000c68: 38000cf0 00000008 38000200
00000000
>>>>>>> 00000000 0800ca79 38000500 00000001
>>>>>>> stack_dump: 0x38000c88: 00000064 38000cf0 00000064
0800ca33
>>>>>>> 38000500 00000001 00000064 00000000
>>>>>>> stack_dump: 0x38000ca8: 00000000 08009325 00000000
38000500
>>>>>>> 00000001 0800c9fd 00000000 080052f1
>>>>>>> stack_dump: 0x38000cc8: 00000000 38000500 00000000
38000158
>>>>>>> 00000001 00000001 00000000 00000000
>>>>>>> stack_dump: 0x38000ce8: 00000000 00000000 00000000
00000000
>>>>>>> 00000000 00000000 00000000 00000000
>>>>>>> dump_tasks: PID GROUP PRI POLICY TYPE NPX
STATE EVENT
>>>>>>> SIGMASK STACKBASE STACKSIZE COMMAND
>>>>>>> dump_task: 0 0 0 FIFO Kthread - Ready
>>>>>>> 0000000000000000 0x240018b0 1000 <noname>
>>>>>>> dump_task: 1 1 100 RR Task - Running
>>>>>>> 0000000000000000 0x38000518 2000 <noname> ��]���&
>>>>>>>
>>>>>>> Wondering if anyone has run across this before?
Backtrace
>> shows:
>>>>>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>>>>>> exception_common () at armv7-m/arm_exception.S:127
>>>>>>> 127 mrs r0, ipsr /*
>> R0=exception
>>>>>>> number */
>>>>>>> where
>>>>>>> #0 exception_common () at armv7-m/arm_exception.S:127
>>>>>>> #1 <signal handler called>
>>>>>>> #2 0x08005984 in env_cmpname (pszname=0x801e624 "PS1",
>>>>>>> peqname=0xa <error: Cannot access memory at
address 0xa>)
>>>>>>> at environ/env_findvar.c:50
>>>>>>> #3 0x080059da in env_findvar (group=0x38000200,
pname=0x801e624
>>>>>>> "PS1")
>>>>>>> at environ/env_findvar.c:105
>>>>>>> #4 0x080172c0 in getenv (name=0x801e624 "PS1") at
>>>>>>> environ/env_getenv.c:89
>>>>>>> #5 0x0800caa0 in nsh_update_prompt () at nsh_prompt.c:77
>>>>>>> #6 0x0800cc76 in nsh_session (pstate=0x38000cf0,
login=1,
>> argc=1,
>>>>>>> argv=0x38000500) at nsh_session.c:249
>>>>>>> #7 0x0800ca78 in nsh_consolemain (argc=1,
argv=0x38000500)
>>>>>>> at nsh_consolemain.c:77
>>>>>>> #8 0x0800ca32 in nsh_main (argc=1, argv=0x38000500)
at nsh_
>>>>>> main.c:76
>>>>>>> #9 0x08009324 in nxtask_startup (entrypt=0x800c9fd
<nsh_main>,
>>>>>>> argc=1,
>>>>>>> argv=0x38000500) at sched/task_startup.c:72
>>>>>>> #10 0x080052f0 in nxtask_start () at
task/task_start.c:104
>>>>>>> #11 0x00000000 in ?? ()
>>>>>>>
>>>>>>> Scratching the surface shows that env_findvar() is
called with
>>>>>> group
>>>>>>> pointer of 0x38000200, group->tg_envp is 0x380004b8,
both which
>>>>>> are
>>>>>>> reasonable. But *group->tg_envp is 0xA. Further if I
"watch
>>>>>>> *(int*)0x380004b8" in GDB, I see it is getting
overwritten by
>>>>>>> up_serialout() invoked from stm32_serial.c::up_send.
>>>>>>>
>>>>>>> Any suggestions on how I can best track this down
further?
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>> --
>>>>>>> Peter Barada
>>>>>>> [email protected]
>>>>>>>
>>>>>> --
>>>>>> Peter Barada
>>>>>> [email protected]
>>>>>>
>>>>> --
>>>> Peter [email protected]
>>>>
>>>> --
>>> Peter [email protected]
>>>
>>>
--
Peter Barada
[email protected]