On Wed, Feb 5, 2025 at 1:06 PM chao an <magicd...@gmail.com> wrote:

> Hi,
>
> The behavior of spin_lock needs everyone's advice
>
> After PR14578 <https://github.com/apache/nuttx/pull/14578> was merged into
> the NuttX, the behavior of spin_lock() and spin_lock_irqsave() added the
> feature of disable the preemption:
>
> https://github.com/apache/nuttx/pull/14578
>
> The spin lock behavior changed from:
>
>
> *spin_lock: spin lockspin_lock_irqsave: spin lock + irqsave*
>
> to:
>
>
> *spin_lock: spin lock + sched_lockspin_lock_irqsave: spin lock + irqsave +
> sched_lock*
>
> This change has two key issues
> 1. *Crash*: Since spin_lock depends on sched_lock, the code using this type
> of API needs to re-evaluate the scope of impact, especially when tcb is not
> initialized at the startup stage, sched_lock will cause crash
>
> https://github.com/apache/nuttx/pull/15728
>
>
It isn't an initialization problem, the real cause is some code abusing
spin lock(lock/unlock in the different thread).
After holding sched_lock in spinlock_irqsave, the api requires that the
lock/unlock come from the same thread.

2. *Performance degradation*: spin_lock/spin_lock_irqsave is widely used in
> the kernel, and more than 90% of the code protected by spin_lock will not
> cause context switching, so after the introduction of sched_lock, the
> kernel part has introduced new performance overhead
>
> https://github.com/apache/nuttx/pull/15684
>

> Due to the change in API semantics, we need to further optimize all the
> locations in the kernel that use spin_lock()/spin_lock_irqsave() to solve
> performance issue in evolution.
>

All performance critical code is evaluating carefully to skip the
sched_lock/sched_unlock, like this:
https://github.com/apache/nuttx/pull/15695
And the optimization of sched_lock is under heavy development.


> In PR15705 <https://github.com/apache/nuttx/pull/15705>, Xiaoxiang had
> some
> arguments about API naming with me.
>
> https://github.com/apache/nuttx/pull/15705
>
> My point of view is to restore the original behavior of spin_lock() and
> spin_lock_irqsave(), which brings the following benefits:
> 1. The API semantics remain unchanged, and independent developers and
> projects that use these APIs outside the nuttx repository can keep their
> code without any adjustment
> 2. The API naming is consistent with the internal implementation, and the
> caller can know what is happening inside the function from the API naming
>
> *Option 1:*
>
>
>
> *spin_lock:                   spin lockspin_lock_nopreempt:
> spin_lock + sched_lockspin_lock_irqsave:           spin lock + irq
> savespin_lock_irqsave_nopreempt: spin_lock + irq save + sched_lock*
>
> @xiaoxiang suggested changing the semantics of the API to hold sched_lock
> by default, consistent with Linux:
> 1. This means that all locations in the kernel that call spin_lock and
> spin_lock_irqsave need to be changed
>

Since holding sched locks make spinlock more safe, the change doesn't
require the change from the caller in general.


> spin_lock_irqsave -> spin_lock_irqsave_preempt
> 2. It is impossible to know what happens inside the function from the API
> naming
>
> *Option 2:*
>
>
>
> *spin_lock:                 spin lock + sched_lockspin_lock_preempt:
>   spin_lockspin_lock_irqsave:         spin lock + irq save +
> sched_lockspin_lock_irqsave_preempt: spin_lock + irq save*
>
> I don't know which definition is better, or if any wise person has a better
> choice, please give some advice.
>

Here is the summary why I prefer to add sched_lock in
spin_lock/spin_lock_irqsave by default:

   - The default(short) api should be safe since not all people understand
   the schedule behaviour deeply, please see the related discussion:

https://github.com/apache/nuttx/issues/9531
https://github.com/apache/nuttx/issues/1138
https://www.kernel.org/doc/Documentation/preempt-locking.txt


   - spin_lock without sched_lock is unusable in the normal thread context,
   since the scheduler may suspend the spinlock owner thread at any time
   - API semantic align with Linux to avoid Linux developer use the unsafe
   api without any notification
   - the caller of enter_critical section can call sem_post/mq_send without
   problem, spin_lock/spin_lock_irqsave can only achieve the same behaviour by
   holding sched lock

Reply via email to