Re: sched lock/unlock issue on rp2350

Alan C. Assis Tue, 06 May 2025 09:12:43 -0700

WOW! Nice catch!

Question to some people with more experience in the scheduler:


Why wasn't this issue detected before? It was added more than 9 months ago.

Is there some way to test and enforce that nxsched_process_delivered() and
other schedule functions are working as expected?

BR,

Alan

On Tue, May 6, 2025 at 11:14 AM Serg Podtynnyi <[email protected]>
wrote:

> Hi, I am still researching the sched problem on my config
>
> Found this macro code  from last September,  looks like pre vs prev
> typo, right?
>
> #definedq_insert_mid(pre,mid,next)\
> do\
> {\
> mid->flink =next;\
> mid->blink =prev;\
> pre->flink =mid;\
> next->blink =mid;\
> }\
> while(0)
>
>
>
> On 4/24/25 15:06, hujun260 wrote:
> > The lockcount can be understood as a local variable, so no race
> conditions will occur.
> > 编辑
> > 分享
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2025-04-24 15:54:48, "Serg Podtynnyi"<[email protected]>
> wrote:
> >> Hi, I tried my best to understand the logic behind unlock and
> >> merge_pending routines and come up with this fix that works for me.
> >> I split the lockcount handling in 2 parts when lockcount is > 1 just
> >> decrement fast as always,
> >> but when it's 1 we enter critical section and make it 0 inside
> >> it(pre-emption is enabled).
> >> I think there is a small time after decrement in the if statement and
> >>   enter_critical_section_wo_note() in original code where can race
> happen.
> >> Does this make sense? Sorry if it's a dumb question - scheduler stuff
> >> with SMP is rocket-science level for me yet.
> >>
> >> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c
> >>
> >> index 4c7d681454..15fc39b546 100644
> >> --- a/sched/sched/sched_unlock.c
> >> +++ b/sched/sched/sched_unlock.c
> >> @@ -65,13 +65,14 @@ void sched_unlock(void)
> >>
> >>         DEBUGASSERT(rtcb == NULL || rtcb->lockcount > 0);
> >>
> >> -      /* Check if the lock counter has decremented to zero. If so,
> >> -       * then pre-emption has been re-enabled.
> >> -       */
> >> -
> >> -      if (rtcb != NULL && --rtcb->lockcount == 0)
> >> +      if (rtcb != NULL && rtcb->lockcount > 1)
> >> +        {
> >> +          rtcb->lockcount--;
> >> +        }
> >> +      else if (rtcb != NULL && rtcb->lockcount == 1)
> >>           {
> >>             irqstate_t flags = enter_critical_section_wo_note();
> >> +          rtcb->lockcount  = 0;
> >>
> >>             /* Note that we no longer have pre-emption disabled. */
> >>
> >> --
> >> 2.49.0
> >> On 4/22/25 14:32, hujun260 wrote:
> >>
> >> It seems to have nothing to do with the race condition, and since the
> rtcb is this_task, it will only be modified by one CPU.<br/><br/>As for why
> your modification can solve the problem, I haven't figured it out yet. Can
> you further analyze the reasons for the failure to boot?
> >> At 2025-04-22 13:25:13, "Serg Podtynnyi"<[email protected]>
> wrote:
> >>
> >>> Hi All, Gregory, Sebastien,
> >>>
> >>> I'm porting NuttX on rp2350(pico2 board inside ClockWork PicoCalc) and
> >>> having trouble getting SMP (2-core) config running
> >>>
> >>> without patching the sched lock/unlock routines.
> >>>
> >>> I am looking at the commit
> >>>
> https://github.com/apache/nuttx/commit/914ae532e68bf4ca75c758d6f541a1d2fcdce8c3
> >>> and reverting it helps to boot the system.
> >>>
> >>> This the the patch to make it work for me, I updated the current
> >>> lock/unlock by moving the enter_critical_section above the access to
> >>> rtcb->lockcount
> >>>
> >>> Do you think it could be the case of a race condition with
> rtcb->lockcount ?
> >>>
> >>> PS
> >>>
> >>> It's almost 11 years since I last touched NuttX - it's a please to work
> >>> with it again.
> >>>
> >>>
> >>>   From 86fa0a65fdd37804154faf3db52e2826281cdfbd Mon Sep 17 00:00:00
> 2001
> >>> From: Serg Podtynnyi<[email protected]>
> >>> Date: Tue, 22 Apr 2025 11:37:48 +0700
> >>> Subject: sched: update lock/unlock critical section entry
> >>>
> >>> Level up enter_critical_section in sched lock/unlock to cover
> >>> rtcb->lockcount access
> >>>
> >>> Signed-off-by: Serg Podtynnyi<[email protected]>
> >>> ---
> >>>    sched/sched/sched_lock.c   | 4 ++--
> >>>    sched/sched/sched_unlock.c | 4 ++--
> >>>    2 files changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/sched/sched/sched_lock.c b/sched/sched/sched_lock.c
> >>> index d8f75e7c90..9cc8da3df4 100644
> >>> --- a/sched/sched/sched_lock.c
> >>> +++ b/sched/sched/sched_lock.c
> >>> @@ -82,18 +82,18 @@ void sched_lock(void)
> >>>           * operations on this thread (on any CPU)
> >>>           */
> >>>
> >>> +      irqstate_t flags = enter_critical_section_wo_note();
> >>>          if (rtcb != NULL && rtcb->lockcount++ == 0)
> >>>            {
> >>>    #if (CONFIG_SCHED_CRITMONITOR_MAXTIME_PREEMPTION >= 0) || \
> >>>        defined(CONFIG_SCHED_INSTRUMENTATION_PREEMPTION)
> >>> -          irqstate_t flags = enter_critical_section_wo_note();
> >>>
> >>>              /* Note that we have pre-emption locked */
> >>>
> >>>              nxsched_critmon_preemption(rtcb, true, return_address(0));
> >>>              sched_note_preemption(rtcb, true);
> >>> -          leave_critical_section_wo_note(flags);
> >>>    #endif
> >>>            }
> >>> +      leave_critical_section_wo_note(flags);
> >>>        }
> >>>    }
> >>> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c
> >>> index 4c7d681454..d21e007587 100644
> >>> --- a/sched/sched/sched_unlock.c
> >>> +++ b/sched/sched/sched_unlock.c
> >>> @@ -69,9 +69,9 @@ void sched_unlock(void)
> >>>           * then pre-emption has been re-enabled.
> >>>           */
> >>>
> >>> +      irqstate_t flags = enter_critical_section_wo_note();
> >>>          if (rtcb != NULL && --rtcb->lockcount == 0)
> >>>            {
> >>> -          irqstate_t flags = enter_critical_section_wo_note();
> >>>
> >>>              /* Note that we no longer have pre-emption disabled. */
> >>>
> >>> @@ -162,7 +162,7 @@ void sched_unlock(void)
> >>>                }
> >>>    #endif
> >>>
> >>> -          leave_critical_section_wo_note(flags);
> >>>            }
> >>> +      leave_critical_section_wo_note(flags);
> >>>        }
> >>>    }
> >>> --
> >>> 2.49.0
> >>>
> >>>
> >>> --
> >>> Serg Podtynnyi
> >>>
> >> -- Serg Podtynnyi
>
> --
> Serg Podtynnyi
>
>

Re: sched lock/unlock issue on rp2350

Reply via email to