This file:
https://github.com/apache/nuttx/commit/9de9f8168d6de8eab8d3a97aac21aacc4e84dd84#diff-ac1d3cade3d9ab380d40fe31f8c006b3035c7b53d04c28e23d9f1ce9a176572c

On Tue, May 6, 2025 at 1:18 PM Alan C. Assis <acas...@gmail.com> wrote:

> Hi Serg,
>
> I did an analysis here and this macro was introduced in Jan 10 2024, so it
> was more than 1 year ago:
>
>
> https://github.com/apache/nuttx/commit/9de9f8168d6de8eab8d3a97aac21aacc4e84dd84
>
> BR,
>
> Alan
>
> On Tue, May 6, 2025 at 1:12 PM Alan C. Assis <acas...@gmail.com> wrote:
>
>> WOW! Nice catch!
>>
>> Question to some people with more experience in the scheduler:
>>
>> Why wasn't this issue detected before? It was added more than 9 months
>> ago.
>>
>> Is there some way to test and enforce that nxsched_process_delivered()
>> and other schedule functions are working as expected?
>>
>> BR,
>>
>> Alan
>>
>> On Tue, May 6, 2025 at 11:14 AM Serg Podtynnyi <s...@podtynnyi.com.invalid>
>> wrote:
>>
>>> Hi, I am still researching the sched problem on my config
>>>
>>> Found this macro code  from last September,  looks like pre vs prev
>>> typo, right?
>>>
>>> #definedq_insert_mid(pre,mid,next)\
>>> do\
>>> {\
>>> mid->flink =next;\
>>> mid->blink =prev;\
>>> pre->flink =mid;\
>>> next->blink =mid;\
>>> }\
>>> while(0)
>>>
>>>
>>>
>>> On 4/24/25 15:06, hujun260 wrote:
>>> > The lockcount can be understood as a local variable, so no race
>>> conditions will occur.
>>> > 编辑
>>> > 分享
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > At 2025-04-24 15:54:48, "Serg Podtynnyi"<s...@podtynnyi.com.INVALID>
>>> wrote:
>>> >> Hi, I tried my best to understand the logic behind unlock and
>>> >> merge_pending routines and come up with this fix that works for me.
>>> >> I split the lockcount handling in 2 parts when lockcount is > 1 just
>>> >> decrement fast as always,
>>> >> but when it's 1 we enter critical section and make it 0 inside
>>> >> it(pre-emption is enabled).
>>> >> I think there is a small time after decrement in the if statement and
>>> >>   enter_critical_section_wo_note() in original code where can race
>>> happen.
>>> >> Does this make sense? Sorry if it's a dumb question - scheduler stuff
>>> >> with SMP is rocket-science level for me yet.
>>> >>
>>> >> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c
>>> >>
>>> >> index 4c7d681454..15fc39b546 100644
>>> >> --- a/sched/sched/sched_unlock.c
>>> >> +++ b/sched/sched/sched_unlock.c
>>> >> @@ -65,13 +65,14 @@ void sched_unlock(void)
>>> >>
>>> >>         DEBUGASSERT(rtcb == NULL || rtcb->lockcount > 0);
>>> >>
>>> >> -      /* Check if the lock counter has decremented to zero. If so,
>>> >> -       * then pre-emption has been re-enabled.
>>> >> -       */
>>> >> -
>>> >> -      if (rtcb != NULL && --rtcb->lockcount == 0)
>>> >> +      if (rtcb != NULL && rtcb->lockcount > 1)
>>> >> +        {
>>> >> +          rtcb->lockcount--;
>>> >> +        }
>>> >> +      else if (rtcb != NULL && rtcb->lockcount == 1)
>>> >>           {
>>> >>             irqstate_t flags = enter_critical_section_wo_note();
>>> >> +          rtcb->lockcount  = 0;
>>> >>
>>> >>             /* Note that we no longer have pre-emption disabled. */
>>> >>
>>> >> --
>>> >> 2.49.0
>>> >> On 4/22/25 14:32, hujun260 wrote:
>>> >>
>>> >> It seems to have nothing to do with the race condition, and since the
>>> rtcb is this_task, it will only be modified by one CPU.<br/><br/>As for why
>>> your modification can solve the problem, I haven't figured it out yet. Can
>>> you further analyze the reasons for the failure to boot?
>>> >> At 2025-04-22 13:25:13, "Serg Podtynnyi"<s...@podtynnyi.com.INVALID>
>>> wrote:
>>> >>
>>> >>> Hi All, Gregory, Sebastien,
>>> >>>
>>> >>> I'm porting NuttX on rp2350(pico2 board inside ClockWork PicoCalc)
>>> and
>>> >>> having trouble getting SMP (2-core) config running
>>> >>>
>>> >>> without patching the sched lock/unlock routines.
>>> >>>
>>> >>> I am looking at the commit
>>> >>>
>>> https://github.com/apache/nuttx/commit/914ae532e68bf4ca75c758d6f541a1d2fcdce8c3
>>> >>> and reverting it helps to boot the system.
>>> >>>
>>> >>> This the the patch to make it work for me, I updated the current
>>> >>> lock/unlock by moving the enter_critical_section above the access to
>>> >>> rtcb->lockcount
>>> >>>
>>> >>> Do you think it could be the case of a race condition with
>>> rtcb->lockcount ?
>>> >>>
>>> >>> PS
>>> >>>
>>> >>> It's almost 11 years since I last touched NuttX - it's a please to
>>> work
>>> >>> with it again.
>>> >>>
>>> >>>
>>> >>>   From 86fa0a65fdd37804154faf3db52e2826281cdfbd Mon Sep 17 00:00:00
>>> 2001
>>> >>> From: Serg Podtynnyi<s...@podtynnyi.com>
>>> >>> Date: Tue, 22 Apr 2025 11:37:48 +0700
>>> >>> Subject: sched: update lock/unlock critical section entry
>>> >>>
>>> >>> Level up enter_critical_section in sched lock/unlock to cover
>>> >>> rtcb->lockcount access
>>> >>>
>>> >>> Signed-off-by: Serg Podtynnyi<s...@podtynnyi.com>
>>> >>> ---
>>> >>>    sched/sched/sched_lock.c   | 4 ++--
>>> >>>    sched/sched/sched_unlock.c | 4 ++--
>>> >>>    2 files changed, 4 insertions(+), 4 deletions(-)
>>> >>>
>>> >>> diff --git a/sched/sched/sched_lock.c b/sched/sched/sched_lock.c
>>> >>> index d8f75e7c90..9cc8da3df4 100644
>>> >>> --- a/sched/sched/sched_lock.c
>>> >>> +++ b/sched/sched/sched_lock.c
>>> >>> @@ -82,18 +82,18 @@ void sched_lock(void)
>>> >>>           * operations on this thread (on any CPU)
>>> >>>           */
>>> >>>
>>> >>> +      irqstate_t flags = enter_critical_section_wo_note();
>>> >>>          if (rtcb != NULL && rtcb->lockcount++ == 0)
>>> >>>            {
>>> >>>    #if (CONFIG_SCHED_CRITMONITOR_MAXTIME_PREEMPTION >= 0) || \
>>> >>>        defined(CONFIG_SCHED_INSTRUMENTATION_PREEMPTION)
>>> >>> -          irqstate_t flags = enter_critical_section_wo_note();
>>> >>>
>>> >>>              /* Note that we have pre-emption locked */
>>> >>>
>>> >>>              nxsched_critmon_preemption(rtcb, true,
>>> return_address(0));
>>> >>>              sched_note_preemption(rtcb, true);
>>> >>> -          leave_critical_section_wo_note(flags);
>>> >>>    #endif
>>> >>>            }
>>> >>> +      leave_critical_section_wo_note(flags);
>>> >>>        }
>>> >>>    }
>>> >>> diff --git a/sched/sched/sched_unlock.c b/sched/sched/sched_unlock.c
>>> >>> index 4c7d681454..d21e007587 100644
>>> >>> --- a/sched/sched/sched_unlock.c
>>> >>> +++ b/sched/sched/sched_unlock.c
>>> >>> @@ -69,9 +69,9 @@ void sched_unlock(void)
>>> >>>           * then pre-emption has been re-enabled.
>>> >>>           */
>>> >>>
>>> >>> +      irqstate_t flags = enter_critical_section_wo_note();
>>> >>>          if (rtcb != NULL && --rtcb->lockcount == 0)
>>> >>>            {
>>> >>> -          irqstate_t flags = enter_critical_section_wo_note();
>>> >>>
>>> >>>              /* Note that we no longer have pre-emption disabled. */
>>> >>>
>>> >>> @@ -162,7 +162,7 @@ void sched_unlock(void)
>>> >>>                }
>>> >>>    #endif
>>> >>>
>>> >>> -          leave_critical_section_wo_note(flags);
>>> >>>            }
>>> >>> +      leave_critical_section_wo_note(flags);
>>> >>>        }
>>> >>>    }
>>> >>> --
>>> >>> 2.49.0
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Serg Podtynnyi
>>> >>>
>>> >> -- Serg Podtynnyi
>>>
>>> --
>>> Serg Podtynnyi
>>>
>>>

Reply via email to