Hi,

I am just going to share this here too.
I experienced ostest fails both on RP2040 and RP2350.
Disabling SMP on RP2350 only moved the problem elsewhere.
It stopped the ostest fails for a bit, but when adding more applications
and threads, the issue comes back.
Sometimes it hangs, rarely it halts. All depends on how you built it, what
you put in it and in what order.
Typical memory corruption issues.

It is harder to replicate though. However, the RP2x is very very unstable
from what I have experienced so far.

I am not sure if we should focus all in on the SMP specifically.

Op do 24 apr 2025 om 13:24 schreef Serg Podtynnyi
<s...@podtynnyi.com.invalid>:

> Yes, I understood that, what if CPU A and CPU B enters the unlock routine
> with 1 lockcount on the same task and after the decrement they both go into
> preemption and manipulating with linked list task list?
>
> On April 24, 2025 8:06:36 AM UTC, hujun260 <hujun...@163.com> wrote:
> >The lockcount can be understood as a local variable, so no race
> conditions will occur.
> >编辑
> >分享
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >At 2025-04-24 15:54:48, "Serg Podtynnyi" <s...@podtynnyi.com.INVALID>
> wrote:
> >>Hi, I tried my best to understand the logic behind unlock and
> >>merge_pending routines and come up with this fix that works for me.
> >>I split the lockcount handling in 2 parts when lockcount is > 1 just
> >>decrement fast as always,
> >>but when it's 1 we enter critical section and make it 0 inside
> >>it(pre-emption is enabled).
> >>I think there is a small time after decrement in the if statement and
> >>  enter_critical_section_wo_note() in original code where can race
> happen.
> >>Does this make sense? Sorry if it's a dumb question - scheduler stuff
> >>with SMP is rocket-science level for me yet.
> >>
> >>diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c
> >>
> >>index 4c7d681454..15fc39b546 100644
> >>--- a/sched/sched/sched_unlock.c
> >>+++ b/sched/sched/sched_unlock.c
> >>@@ -65,13 +65,14 @@ void sched_unlock(void)
> >>
> >>        DEBUGASSERT(rtcb == NULL || rtcb->lockcount > 0);
> >>
> >>-      /* Check if the lock counter has decremented to zero. If so,
> >>-       * then pre-emption has been re-enabled.
> >>-       */
> >>-
> >>-      if (rtcb != NULL && --rtcb->lockcount == 0)
> >>+      if (rtcb != NULL && rtcb->lockcount > 1)
> >>+        {
> >>+          rtcb->lockcount--;
> >>+        }
> >>+      else if (rtcb != NULL && rtcb->lockcount == 1)
> >>          {
> >>            irqstate_t flags = enter_critical_section_wo_note();
> >>+          rtcb->lockcount  = 0;
> >>
> >>            /* Note that we no longer have pre-emption disabled. */
> >>
> >>--
> >>2.49.0
> >>On 4/22/25 14:32, hujun260 wrote:
> >>
> >>It seems to have nothing to do with the race condition, and since the
> rtcb is this_task, it will only be modified by one CPU.<br/><br/>As for why
> your modification can solve the problem, I haven't figured it out yet. Can
> you further analyze the reasons for the failure to boot?
> >>At 2025-04-22 13:25:13, "Serg Podtynnyi" <s...@podtynnyi.com.INVALID>
> wrote:
> >>
> >>> Hi All, Gregory, Sebastien,
> >>>
> >>> I'm porting NuttX on rp2350(pico2 board inside ClockWork PicoCalc) and
> >>> having trouble getting SMP (2-core) config running
> >>>
> >>> without patching the sched lock/unlock routines.
> >>>
> >>> I am looking at the commit
> >>>
> https://github.com/apache/nuttx/commit/914ae532e68bf4ca75c758d6f541a1d2fcdce8c3
> >>> and reverting it helps to boot the system.
> >>>
> >>> This the the patch to make it work for me, I updated the current
> >>> lock/unlock by moving the enter_critical_section above the access to
> >>> rtcb->lockcount
> >>>
> >>> Do you think it could be the case of a race condition with
> rtcb->lockcount ?
> >>>
> >>> PS
> >>>
> >>> It's almost 11 years since I last touched NuttX - it's a please to work
> >>> with it again.
> >>>
> >>>
> >>>  From 86fa0a65fdd37804154faf3db52e2826281cdfbd Mon Sep 17 00:00:00 2001
> >>> From: Serg Podtynnyi <s...@podtynnyi.com>
> >>> Date: Tue, 22 Apr 2025 11:37:48 +0700
> >>> Subject: sched: update lock/unlock critical section entry
> >>>
> >>> Level up enter_critical_section in sched lock/unlock to cover
> >>> rtcb->lockcount access
> >>>
> >>> Signed-off-by: Serg Podtynnyi <s...@podtynnyi.com>
> >>> ---
> >>>   sched/sched/sched_lock.c   | 4 ++--
> >>>   sched/sched/sched_unlock.c | 4 ++--
> >>>   2 files changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/sched/sched/sched_lock.c b/sched/sched/sched_lock.c
> >>> index d8f75e7c90..9cc8da3df4 100644
> >>> --- a/sched/sched/sched_lock.c
> >>> +++ b/sched/sched/sched_lock.c
> >>> @@ -82,18 +82,18 @@ void sched_lock(void)
> >>>          * operations on this thread (on any CPU)
> >>>          */
> >>>
> >>> +      irqstate_t flags = enter_critical_section_wo_note();
> >>>         if (rtcb != NULL && rtcb->lockcount++ == 0)
> >>>           {
> >>>   #if (CONFIG_SCHED_CRITMONITOR_MAXTIME_PREEMPTION >= 0) || \
> >>>       defined(CONFIG_SCHED_INSTRUMENTATION_PREEMPTION)
> >>> -          irqstate_t flags = enter_critical_section_wo_note();
> >>>
> >>>             /* Note that we have pre-emption locked */
> >>>
> >>>             nxsched_critmon_preemption(rtcb, true, return_address(0));
> >>>             sched_note_preemption(rtcb, true);
> >>> -          leave_critical_section_wo_note(flags);
> >>>   #endif
> >>>           }
> >>> +      leave_critical_section_wo_note(flags);
> >>>       }
> >>>   }
> >>> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c
> >>> index 4c7d681454..d21e007587 100644
> >>> --- a/sched/sched/sched_unlock.c
> >>> +++ b/sched/sched/sched_unlock.c
> >>> @@ -69,9 +69,9 @@ void sched_unlock(void)
> >>>          * then pre-emption has been re-enabled.
> >>>          */
> >>>
> >>> +      irqstate_t flags = enter_critical_section_wo_note();
> >>>         if (rtcb != NULL && --rtcb->lockcount == 0)
> >>>           {
> >>> -          irqstate_t flags = enter_critical_section_wo_note();
> >>>
> >>>             /* Note that we no longer have pre-emption disabled. */
> >>>
> >>> @@ -162,7 +162,7 @@ void sched_unlock(void)
> >>>               }
> >>>   #endif
> >>>
> >>> -          leave_critical_section_wo_note(flags);
> >>>           }
> >>> +      leave_critical_section_wo_note(flags);
> >>>       }
> >>>   }
> >>> --
> >>> 2.49.0
> >>>
> >>>
> >>> --
> >>> Serg Podtynnyi
> >>>
> >>-- Serg Podtynnyi
>

Reply via email to