Oh something to add; The serial output is typically messed up (usually when "help" is ran, or printf use). I tried to debug it, but the debugger "gives up / skips" on random instructions. Or fails to write variables. People seem to have accepted this behavior as normal, but I am very worried about it as it gives more clues to the issue from the bigger picture.
Op do 24 apr 2025 om 13:30 schreef Kevin Witteveen <kevinwit1...@gmail.com>: > Hi, > > I am just going to share this here too. > I experienced ostest fails both on RP2040 and RP2350. > Disabling SMP on RP2350 only moved the problem elsewhere. > It stopped the ostest fails for a bit, but when adding more applications > and threads, the issue comes back. > Sometimes it hangs, rarely it halts. All depends on how you built it, what > you put in it and in what order. > Typical memory corruption issues. > > It is harder to replicate though. However, the RP2x is very very unstable > from what I have experienced so far. > > I am not sure if we should focus all in on the SMP specifically. > > Op do 24 apr 2025 om 13:24 schreef Serg Podtynnyi > <s...@podtynnyi.com.invalid>: > >> Yes, I understood that, what if CPU A and CPU B enters the unlock routine >> with 1 lockcount on the same task and after the decrement they both go into >> preemption and manipulating with linked list task list? >> >> On April 24, 2025 8:06:36 AM UTC, hujun260 <hujun...@163.com> wrote: >> >The lockcount can be understood as a local variable, so no race >> conditions will occur. >> >编辑 >> >分享 >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >At 2025-04-24 15:54:48, "Serg Podtynnyi" <s...@podtynnyi.com.INVALID> >> wrote: >> >>Hi, I tried my best to understand the logic behind unlock and >> >>merge_pending routines and come up with this fix that works for me. >> >>I split the lockcount handling in 2 parts when lockcount is > 1 just >> >>decrement fast as always, >> >>but when it's 1 we enter critical section and make it 0 inside >> >>it(pre-emption is enabled). >> >>I think there is a small time after decrement in the if statement and >> >> enter_critical_section_wo_note() in original code where can race >> happen. >> >>Does this make sense? Sorry if it's a dumb question - scheduler stuff >> >>with SMP is rocket-science level for me yet. >> >> >> >>diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c >> >> >> >>index 4c7d681454..15fc39b546 100644 >> >>--- a/sched/sched/sched_unlock.c >> >>+++ b/sched/sched/sched_unlock.c >> >>@@ -65,13 +65,14 @@ void sched_unlock(void) >> >> >> >> DEBUGASSERT(rtcb == NULL || rtcb->lockcount > 0); >> >> >> >>- /* Check if the lock counter has decremented to zero. If so, >> >>- * then pre-emption has been re-enabled. >> >>- */ >> >>- >> >>- if (rtcb != NULL && --rtcb->lockcount == 0) >> >>+ if (rtcb != NULL && rtcb->lockcount > 1) >> >>+ { >> >>+ rtcb->lockcount--; >> >>+ } >> >>+ else if (rtcb != NULL && rtcb->lockcount == 1) >> >> { >> >> irqstate_t flags = enter_critical_section_wo_note(); >> >>+ rtcb->lockcount = 0; >> >> >> >> /* Note that we no longer have pre-emption disabled. */ >> >> >> >>-- >> >>2.49.0 >> >>On 4/22/25 14:32, hujun260 wrote: >> >> >> >>It seems to have nothing to do with the race condition, and since the >> rtcb is this_task, it will only be modified by one CPU.<br/><br/>As for why >> your modification can solve the problem, I haven't figured it out yet. Can >> you further analyze the reasons for the failure to boot? >> >>At 2025-04-22 13:25:13, "Serg Podtynnyi" <s...@podtynnyi.com.INVALID> >> wrote: >> >> >> >>> Hi All, Gregory, Sebastien, >> >>> >> >>> I'm porting NuttX on rp2350(pico2 board inside ClockWork PicoCalc) and >> >>> having trouble getting SMP (2-core) config running >> >>> >> >>> without patching the sched lock/unlock routines. >> >>> >> >>> I am looking at the commit >> >>> >> https://github.com/apache/nuttx/commit/914ae532e68bf4ca75c758d6f541a1d2fcdce8c3 >> >>> and reverting it helps to boot the system. >> >>> >> >>> This the the patch to make it work for me, I updated the current >> >>> lock/unlock by moving the enter_critical_section above the access to >> >>> rtcb->lockcount >> >>> >> >>> Do you think it could be the case of a race condition with >> rtcb->lockcount ? >> >>> >> >>> PS >> >>> >> >>> It's almost 11 years since I last touched NuttX - it's a please to >> work >> >>> with it again. >> >>> >> >>> >> >>> From 86fa0a65fdd37804154faf3db52e2826281cdfbd Mon Sep 17 00:00:00 >> 2001 >> >>> From: Serg Podtynnyi <s...@podtynnyi.com> >> >>> Date: Tue, 22 Apr 2025 11:37:48 +0700 >> >>> Subject: sched: update lock/unlock critical section entry >> >>> >> >>> Level up enter_critical_section in sched lock/unlock to cover >> >>> rtcb->lockcount access >> >>> >> >>> Signed-off-by: Serg Podtynnyi <s...@podtynnyi.com> >> >>> --- >> >>> sched/sched/sched_lock.c | 4 ++-- >> >>> sched/sched/sched_unlock.c | 4 ++-- >> >>> 2 files changed, 4 insertions(+), 4 deletions(-) >> >>> >> >>> diff --git a/sched/sched/sched_lock.c b/sched/sched/sched_lock.c >> >>> index d8f75e7c90..9cc8da3df4 100644 >> >>> --- a/sched/sched/sched_lock.c >> >>> +++ b/sched/sched/sched_lock.c >> >>> @@ -82,18 +82,18 @@ void sched_lock(void) >> >>> * operations on this thread (on any CPU) >> >>> */ >> >>> >> >>> + irqstate_t flags = enter_critical_section_wo_note(); >> >>> if (rtcb != NULL && rtcb->lockcount++ == 0) >> >>> { >> >>> #if (CONFIG_SCHED_CRITMONITOR_MAXTIME_PREEMPTION >= 0) || \ >> >>> defined(CONFIG_SCHED_INSTRUMENTATION_PREEMPTION) >> >>> - irqstate_t flags = enter_critical_section_wo_note(); >> >>> >> >>> /* Note that we have pre-emption locked */ >> >>> >> >>> nxsched_critmon_preemption(rtcb, true, return_address(0)); >> >>> sched_note_preemption(rtcb, true); >> >>> - leave_critical_section_wo_note(flags); >> >>> #endif >> >>> } >> >>> + leave_critical_section_wo_note(flags); >> >>> } >> >>> } >> >>> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c >> >>> index 4c7d681454..d21e007587 100644 >> >>> --- a/sched/sched/sched_unlock.c >> >>> +++ b/sched/sched/sched_unlock.c >> >>> @@ -69,9 +69,9 @@ void sched_unlock(void) >> >>> * then pre-emption has been re-enabled. >> >>> */ >> >>> >> >>> + irqstate_t flags = enter_critical_section_wo_note(); >> >>> if (rtcb != NULL && --rtcb->lockcount == 0) >> >>> { >> >>> - irqstate_t flags = enter_critical_section_wo_note(); >> >>> >> >>> /* Note that we no longer have pre-emption disabled. */ >> >>> >> >>> @@ -162,7 +162,7 @@ void sched_unlock(void) >> >>> } >> >>> #endif >> >>> >> >>> - leave_critical_section_wo_note(flags); >> >>> } >> >>> + leave_critical_section_wo_note(flags); >> >>> } >> >>> } >> >>> -- >> >>> 2.49.0 >> >>> >> >>> >> >>> -- >> >>> Serg Podtynnyi >> >>> >> >>-- Serg Podtynnyi >> >