True, so this typo didn't create a side effect, right? Are you git bisecting to find the root cause of the issue?
BR, Alan On Wed, May 7, 2025 at 12:08 AM Serg Podtynnyi <s...@podtynnyi.com.invalid> wrote: > Yeah, but this macro is working because of coincidental match with > variable prev already defined in the scope. > > On 5/6/25 23:19, Alan C. Assis wrote: > > This file: > > > https://github.com/apache/nuttx/commit/9de9f8168d6de8eab8d3a97aac21aacc4e84dd84#diff-ac1d3cade3d9ab380d40fe31f8c006b3035c7b53d04c28e23d9f1ce9a176572c > > > > On Tue, May 6, 2025 at 1:18 PM Alan C. Assis <acas...@gmail.com> wrote: > > > >> Hi Serg, > >> > >> I did an analysis here and this macro was introduced in Jan 10 2024, so > it > >> was more than 1 year ago: > >> > >> > >> > https://github.com/apache/nuttx/commit/9de9f8168d6de8eab8d3a97aac21aacc4e84dd84 > >> > >> BR, > >> > >> Alan > >> > >> On Tue, May 6, 2025 at 1:12 PM Alan C. Assis <acas...@gmail.com> wrote: > >> > >>> WOW! Nice catch! > >>> > >>> Question to some people with more experience in the scheduler: > >>> > >>> Why wasn't this issue detected before? It was added more than 9 months > >>> ago. > >>> > >>> Is there some way to test and enforce that nxsched_process_delivered() > >>> and other schedule functions are working as expected? > >>> > >>> BR, > >>> > >>> Alan > >>> > >>> On Tue, May 6, 2025 at 11:14 AM Serg Podtynnyi > <s...@podtynnyi.com.invalid> > >>> wrote: > >>> > >>>> Hi, I am still researching the sched problem on my config > >>>> > >>>> Found this macro code from last September, looks like pre vs prev > >>>> typo, right? > >>>> > >>>> #definedq_insert_mid(pre,mid,next)\ > >>>> do\ > >>>> {\ > >>>> mid->flink =next;\ > >>>> mid->blink =prev;\ > >>>> pre->flink =mid;\ > >>>> next->blink =mid;\ > >>>> }\ > >>>> while(0) > >>>> > >>>> > >>>> > >>>> On 4/24/25 15:06, hujun260 wrote: > >>>>> The lockcount can be understood as a local variable, so no race > >>>> conditions will occur. > >>>>> 编辑 > >>>>> 分享 > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> At 2025-04-24 15:54:48, "Serg Podtynnyi"<s...@podtynnyi.com.INVALID> > >>>> wrote: > >>>>>> Hi, I tried my best to understand the logic behind unlock and > >>>>>> merge_pending routines and come up with this fix that works for me. > >>>>>> I split the lockcount handling in 2 parts when lockcount is > 1 just > >>>>>> decrement fast as always, > >>>>>> but when it's 1 we enter critical section and make it 0 inside > >>>>>> it(pre-emption is enabled). > >>>>>> I think there is a small time after decrement in the if statement > and > >>>>>> enter_critical_section_wo_note() in original code where can race > >>>> happen. > >>>>>> Does this make sense? Sorry if it's a dumb question - scheduler > stuff > >>>>>> with SMP is rocket-science level for me yet. > >>>>>> > >>>>>> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c > >>>>>> > >>>>>> index 4c7d681454..15fc39b546 100644 > >>>>>> --- a/sched/sched/sched_unlock.c > >>>>>> +++ b/sched/sched/sched_unlock.c > >>>>>> @@ -65,13 +65,14 @@ void sched_unlock(void) > >>>>>> > >>>>>> DEBUGASSERT(rtcb == NULL || rtcb->lockcount > 0); > >>>>>> > >>>>>> - /* Check if the lock counter has decremented to zero. If so, > >>>>>> - * then pre-emption has been re-enabled. > >>>>>> - */ > >>>>>> - > >>>>>> - if (rtcb != NULL && --rtcb->lockcount == 0) > >>>>>> + if (rtcb != NULL && rtcb->lockcount > 1) > >>>>>> + { > >>>>>> + rtcb->lockcount--; > >>>>>> + } > >>>>>> + else if (rtcb != NULL && rtcb->lockcount == 1) > >>>>>> { > >>>>>> irqstate_t flags = enter_critical_section_wo_note(); > >>>>>> + rtcb->lockcount = 0; > >>>>>> > >>>>>> /* Note that we no longer have pre-emption disabled. */ > >>>>>> > >>>>>> -- > >>>>>> 2.49.0 > >>>>>> On 4/22/25 14:32, hujun260 wrote: > >>>>>> > >>>>>> It seems to have nothing to do with the race condition, and since > the > >>>> rtcb is this_task, it will only be modified by one CPU.<br/><br/>As > for why > >>>> your modification can solve the problem, I haven't figured it out > yet. Can > >>>> you further analyze the reasons for the failure to boot? > >>>>>> At 2025-04-22 13:25:13, "Serg Podtynnyi"<s...@podtynnyi.com > .INVALID> > >>>> wrote: > >>>>>>> Hi All, Gregory, Sebastien, > >>>>>>> > >>>>>>> I'm porting NuttX on rp2350(pico2 board inside ClockWork PicoCalc) > >>>> and > >>>>>>> having trouble getting SMP (2-core) config running > >>>>>>> > >>>>>>> without patching the sched lock/unlock routines. > >>>>>>> > >>>>>>> I am looking at the commit > >>>>>>> > >>>> > https://github.com/apache/nuttx/commit/914ae532e68bf4ca75c758d6f541a1d2fcdce8c3 > >>>>>>> and reverting it helps to boot the system. > >>>>>>> > >>>>>>> This the the patch to make it work for me, I updated the current > >>>>>>> lock/unlock by moving the enter_critical_section above the access > to > >>>>>>> rtcb->lockcount > >>>>>>> > >>>>>>> Do you think it could be the case of a race condition with > >>>> rtcb->lockcount ? > >>>>>>> PS > >>>>>>> > >>>>>>> It's almost 11 years since I last touched NuttX - it's a please to > >>>> work > >>>>>>> with it again. > >>>>>>> > >>>>>>> > >>>>>>> From 86fa0a65fdd37804154faf3db52e2826281cdfbd Mon Sep 17 > 00:00:00 > >>>> 2001 > >>>>>>> From: Serg Podtynnyi<s...@podtynnyi.com> > >>>>>>> Date: Tue, 22 Apr 2025 11:37:48 +0700 > >>>>>>> Subject: sched: update lock/unlock critical section entry > >>>>>>> > >>>>>>> Level up enter_critical_section in sched lock/unlock to cover > >>>>>>> rtcb->lockcount access > >>>>>>> > >>>>>>> Signed-off-by: Serg Podtynnyi<s...@podtynnyi.com> > >>>>>>> --- > >>>>>>> sched/sched/sched_lock.c | 4 ++-- > >>>>>>> sched/sched/sched_unlock.c | 4 ++-- > >>>>>>> 2 files changed, 4 insertions(+), 4 deletions(-) > >>>>>>> > >>>>>>> diff --git a/sched/sched/sched_lock.c b/sched/sched/sched_lock.c > >>>>>>> index d8f75e7c90..9cc8da3df4 100644 > >>>>>>> --- a/sched/sched/sched_lock.c > >>>>>>> +++ b/sched/sched/sched_lock.c > >>>>>>> @@ -82,18 +82,18 @@ void sched_lock(void) > >>>>>>> * operations on this thread (on any CPU) > >>>>>>> */ > >>>>>>> > >>>>>>> + irqstate_t flags = enter_critical_section_wo_note(); > >>>>>>> if (rtcb != NULL && rtcb->lockcount++ == 0) > >>>>>>> { > >>>>>>> #if (CONFIG_SCHED_CRITMONITOR_MAXTIME_PREEMPTION >= 0) || \ > >>>>>>> defined(CONFIG_SCHED_INSTRUMENTATION_PREEMPTION) > >>>>>>> - irqstate_t flags = enter_critical_section_wo_note(); > >>>>>>> > >>>>>>> /* Note that we have pre-emption locked */ > >>>>>>> > >>>>>>> nxsched_critmon_preemption(rtcb, true, > >>>> return_address(0)); > >>>>>>> sched_note_preemption(rtcb, true); > >>>>>>> - leave_critical_section_wo_note(flags); > >>>>>>> #endif > >>>>>>> } > >>>>>>> + leave_critical_section_wo_note(flags); > >>>>>>> } > >>>>>>> } > >>>>>>> diff --git a/sched/sched/sched_unlock.c > b/sched/sched/sched_unlock.c > >>>>>>> index 4c7d681454..d21e007587 100644 > >>>>>>> --- a/sched/sched/sched_unlock.c > >>>>>>> +++ b/sched/sched/sched_unlock.c > >>>>>>> @@ -69,9 +69,9 @@ void sched_unlock(void) > >>>>>>> * then pre-emption has been re-enabled. > >>>>>>> */ > >>>>>>> > >>>>>>> + irqstate_t flags = enter_critical_section_wo_note(); > >>>>>>> if (rtcb != NULL && --rtcb->lockcount == 0) > >>>>>>> { > >>>>>>> - irqstate_t flags = enter_critical_section_wo_note(); > >>>>>>> > >>>>>>> /* Note that we no longer have pre-emption disabled. > */ > >>>>>>> > >>>>>>> @@ -162,7 +162,7 @@ void sched_unlock(void) > >>>>>>> } > >>>>>>> #endif > >>>>>>> > >>>>>>> - leave_critical_section_wo_note(flags); > >>>>>>> } > >>>>>>> + leave_critical_section_wo_note(flags); > >>>>>>> } > >>>>>>> } > >>>>>>> -- > >>>>>>> 2.49.0 > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Serg Podtynnyi > >>>>>>> > >>>>>> -- Serg Podtynnyi > >>>> -- > >>>> Serg Podtynnyi > >>>> > >>>> > -- > Serg Podtynnyi > >