WOW! Nice catch! Question to some people with more experience in the scheduler:
Why wasn't this issue detected before? It was added more than 9 months ago. Is there some way to test and enforce that nxsched_process_delivered() and other schedule functions are working as expected? BR, Alan On Tue, May 6, 2025 at 11:14 AM Serg Podtynnyi <s...@podtynnyi.com.invalid> wrote: > Hi, I am still researching the sched problem on my config > > Found this macro code from last September, looks like pre vs prev > typo, right? > > #definedq_insert_mid(pre,mid,next)\ > do\ > {\ > mid->flink =next;\ > mid->blink =prev;\ > pre->flink =mid;\ > next->blink =mid;\ > }\ > while(0) > > > > On 4/24/25 15:06, hujun260 wrote: > > The lockcount can be understood as a local variable, so no race > conditions will occur. > > 编辑 > > 分享 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2025-04-24 15:54:48, "Serg Podtynnyi"<s...@podtynnyi.com.INVALID> > wrote: > >> Hi, I tried my best to understand the logic behind unlock and > >> merge_pending routines and come up with this fix that works for me. > >> I split the lockcount handling in 2 parts when lockcount is > 1 just > >> decrement fast as always, > >> but when it's 1 we enter critical section and make it 0 inside > >> it(pre-emption is enabled). > >> I think there is a small time after decrement in the if statement and > >> enter_critical_section_wo_note() in original code where can race > happen. > >> Does this make sense? Sorry if it's a dumb question - scheduler stuff > >> with SMP is rocket-science level for me yet. > >> > >> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c > >> > >> index 4c7d681454..15fc39b546 100644 > >> --- a/sched/sched/sched_unlock.c > >> +++ b/sched/sched/sched_unlock.c > >> @@ -65,13 +65,14 @@ void sched_unlock(void) > >> > >> DEBUGASSERT(rtcb == NULL || rtcb->lockcount > 0); > >> > >> - /* Check if the lock counter has decremented to zero. If so, > >> - * then pre-emption has been re-enabled. > >> - */ > >> - > >> - if (rtcb != NULL && --rtcb->lockcount == 0) > >> + if (rtcb != NULL && rtcb->lockcount > 1) > >> + { > >> + rtcb->lockcount--; > >> + } > >> + else if (rtcb != NULL && rtcb->lockcount == 1) > >> { > >> irqstate_t flags = enter_critical_section_wo_note(); > >> + rtcb->lockcount = 0; > >> > >> /* Note that we no longer have pre-emption disabled. */ > >> > >> -- > >> 2.49.0 > >> On 4/22/25 14:32, hujun260 wrote: > >> > >> It seems to have nothing to do with the race condition, and since the > rtcb is this_task, it will only be modified by one CPU.<br/><br/>As for why > your modification can solve the problem, I haven't figured it out yet. Can > you further analyze the reasons for the failure to boot? > >> At 2025-04-22 13:25:13, "Serg Podtynnyi"<s...@podtynnyi.com.INVALID> > wrote: > >> > >>> Hi All, Gregory, Sebastien, > >>> > >>> I'm porting NuttX on rp2350(pico2 board inside ClockWork PicoCalc) and > >>> having trouble getting SMP (2-core) config running > >>> > >>> without patching the sched lock/unlock routines. > >>> > >>> I am looking at the commit > >>> > https://github.com/apache/nuttx/commit/914ae532e68bf4ca75c758d6f541a1d2fcdce8c3 > >>> and reverting it helps to boot the system. > >>> > >>> This the the patch to make it work for me, I updated the current > >>> lock/unlock by moving the enter_critical_section above the access to > >>> rtcb->lockcount > >>> > >>> Do you think it could be the case of a race condition with > rtcb->lockcount ? > >>> > >>> PS > >>> > >>> It's almost 11 years since I last touched NuttX - it's a please to work > >>> with it again. > >>> > >>> > >>> From 86fa0a65fdd37804154faf3db52e2826281cdfbd Mon Sep 17 00:00:00 > 2001 > >>> From: Serg Podtynnyi<s...@podtynnyi.com> > >>> Date: Tue, 22 Apr 2025 11:37:48 +0700 > >>> Subject: sched: update lock/unlock critical section entry > >>> > >>> Level up enter_critical_section in sched lock/unlock to cover > >>> rtcb->lockcount access > >>> > >>> Signed-off-by: Serg Podtynnyi<s...@podtynnyi.com> > >>> --- > >>> sched/sched/sched_lock.c | 4 ++-- > >>> sched/sched/sched_unlock.c | 4 ++-- > >>> 2 files changed, 4 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/sched/sched/sched_lock.c b/sched/sched/sched_lock.c > >>> index d8f75e7c90..9cc8da3df4 100644 > >>> --- a/sched/sched/sched_lock.c > >>> +++ b/sched/sched/sched_lock.c > >>> @@ -82,18 +82,18 @@ void sched_lock(void) > >>> * operations on this thread (on any CPU) > >>> */ > >>> > >>> + irqstate_t flags = enter_critical_section_wo_note(); > >>> if (rtcb != NULL && rtcb->lockcount++ == 0) > >>> { > >>> #if (CONFIG_SCHED_CRITMONITOR_MAXTIME_PREEMPTION >= 0) || \ > >>> defined(CONFIG_SCHED_INSTRUMENTATION_PREEMPTION) > >>> - irqstate_t flags = enter_critical_section_wo_note(); > >>> > >>> /* Note that we have pre-emption locked */ > >>> > >>> nxsched_critmon_preemption(rtcb, true, return_address(0)); > >>> sched_note_preemption(rtcb, true); > >>> - leave_critical_section_wo_note(flags); > >>> #endif > >>> } > >>> + leave_critical_section_wo_note(flags); > >>> } > >>> } > >>> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c > >>> index 4c7d681454..d21e007587 100644 > >>> --- a/sched/sched/sched_unlock.c > >>> +++ b/sched/sched/sched_unlock.c > >>> @@ -69,9 +69,9 @@ void sched_unlock(void) > >>> * then pre-emption has been re-enabled. > >>> */ > >>> > >>> + irqstate_t flags = enter_critical_section_wo_note(); > >>> if (rtcb != NULL && --rtcb->lockcount == 0) > >>> { > >>> - irqstate_t flags = enter_critical_section_wo_note(); > >>> > >>> /* Note that we no longer have pre-emption disabled. */ > >>> > >>> @@ -162,7 +162,7 @@ void sched_unlock(void) > >>> } > >>> #endif > >>> > >>> - leave_critical_section_wo_note(flags); > >>> } > >>> + leave_critical_section_wo_note(flags); > >>> } > >>> } > >>> -- > >>> 2.49.0 > >>> > >>> > >>> -- > >>> Serg Podtynnyi > >>> > >> -- Serg Podtynnyi > > -- > Serg Podtynnyi > >