Re: The behavior of spin_lock needs everyone's advice

Tomek CEDRO Wed, 05 Feb 2025 17:33:13 -0800

Thank you guys for bringing this discussion on the mailing list! We
really should first discuss any breaking change on the list long
enough until everyone is happy.. and we are sure not to break anything
:-)

Suggestions to verify in depth on a real world hardware any subtle
issue will be soon our standard requirement. For now we need to take
care of this manually. Staging environment could be a good choice too.

This is true, many projects, even industrial ones (I will make one
soon big scale, I did already small scale) depend on NuttX API
stability, that means long term self-compatibility and trust, this is
as important as runtime stability, maybe even more in these
applications. This is something unique to NuttX and we should really
respect and nurture :-)

Not many of us have this privilege to be paid for every commit, but if
things get broken all of us will loose money and customers and trust.
Therefore we should really avoid any breaking changes.. and focusing
on alternatives instead.

>From "user" point of view, if the change breaks kernel api, it should
be rejected, also reverted if already applied. If this is the case
right now and reverting will fix the release we should revert the
change and make a patch release 12.8.1. Is this the main problem right
now and will that fix 12.8.0? Then we should go for it :-)

For the later part I need to quote two parts so we have a solid
reference point :-)

Part 1:

On Wed, Feb 5, 2025 at 12:59 PM Guiding Li <[email protected]> wrote:
> *Option 1:*
> spin_lock:                                   spin lock
> spin_lock_nopreempt:                spin_lock + sched_lock
> spin_lock_irqsave:                     spin lock + irqsave
> spin_lock_irqsave_nopreempt:  spin_lock + irq save + sched_lock
>
> *Option 2:*
> spin_lock:                              spin lock + sched_lock
> spin_lock_preempt:               spin_lock
> spin_lock_irqsave:                spin lock + irq save +sched_lock
> spin_lock_irqsave_preempt: spin_lock + irq save
>
> From the correctness level:
> The Option2 is correct, because if you call sem_post()/syslog()/... within
> spin_lock() will cause a system crash.
>
> From the performance level:
> The Option2 has lower efficiency as you said in the NON-SMP case.
>
> Then can we have a method that takes both ?
> Based on option1, we add a check if someone called sem_post()/syslog()...
> then system ASSERT. Alert the people who should change their usage.
> And also the performance will be considered.

Part 2:

On Wed, Feb 5, 2025 at 7:46 AM Xiang Xiao <[email protected]> wrote:
> On Wed, Feb 5, 2025 at 1:06 PM chao an <[email protected]> wrote:
> > My point of view is to restore the original behavior of spin_lock() and
> > spin_lock_irqsave(), which brings the following benefits:
> > 1. The API semantics remain unchanged, and independent developers and
> > projects that use these APIs outside the nuttx repository can keep their
> > code without any adjustment
> > 2. The API naming is consistent with the internal implementation, and the
> > caller can know what is happening inside the function from the API naming
> >
> > *Option 1:*
> >
> > *spin_lock:                   spin lockspin_lock_nopreempt:
> > spin_lock + sched_lockspin_lock_irqsave:           spin lock + irq
> > savespin_lock_irqsave_nopreempt: spin_lock + irq save + sched_lock*
> >
> > @xiaoxiang suggested changing the semantics of the API to hold sched_lock
> > by default, consistent with Linux:
> > 1. This means that all locations in the kernel that call spin_lock and
> > spin_lock_irqsave need to be changed
> >
> Since holding sched locks make spinlock more safe, the change doesn't
> require the change from the caller in general.
>
> > spin_lock_irqsave -> spin_lock_irqsave_preempt
> > 2. It is impossible to know what happens inside the function from the API
> > naming
> >
> > *Option 2:*
> >
> > *spin_lock:                 spin lock + sched_lockspin_lock_preempt:
> >   spin_lockspin_lock_irqsave:         spin lock + irq save +
> > sched_lockspin_lock_irqsave_preempt: spin_lock + irq save*
> >
> > I don't know which definition is better, or if any wise person has a better
> > choice, please give some advice.
>
> Here is the summary why I prefer to add sched_lock in
> spin_lock/spin_lock_irqsave by default:
>
>    - The default(short) api should be safe since not all people understand
>    the schedule behaviour deeply, please see the related discussion:
>
> https://github.com/apache/nuttx/issues/9531
> https://github.com/apache/nuttx/issues/1138
> https://www.kernel.org/doc/Documentation/preempt-locking.txt
>
>
>    - spin_lock without sched_lock is unusable in the normal thread context,
>    since the scheduler may suspend the spinlock owner thread at any time
>    - API semantic align with Linux to avoid Linux developer use the unsafe
>    api without any notification
>    - the caller of enter_critical section can call sem_post/mq_send without
>    problem, spin_lock/spin_lock_irqsave can only achieve the same behaviour by
>    holding sched lock

With the above examples my main concerns are:
1. Functions are not quite elegant and self-explanatory (i.e. why
there are two spinlocks in spin_lockspin_lock_irqsave name? Who will
understand things like spin lock + sched_lockspin_lock_preempt what
does it even mean?).
2. Functions hide a "default" inside, something assumed not explicit,
that is not labeled by the function name.
3. If we change the "default" hidden behavior things for sure get
broken and nobody will know why.
4. We are changing existing kernel api anyways which is bad by design
and I am strongly against it :-)

Why not create backward-compatible variants with fine grained
functionality with self-explanatory names? Users will have a choice on
building blocks to choose, and the responsibility will be on the user
part.. and on our part how well we document it with every possible
explanation and example.. maybe even example application to keep
things crystal clear :-)

spin_lock = spin_lock
spin_lock_shed_lock = spin_lock -> shed_lock
spin_lock_irqsave = spin_lock -> irqsave
spin_lock_irqsave_shed_lock = spin_lock -> irqsave -> shed_lock
spin_lock_shed_lock_irqsave = spin_lock -> shed_lock -> irqsave
(..etc)

This way:
* We maintain backward kernel api compatiblity, all old stuff works as expected.
* Things that need optimization can change without breaking existing
stuff, step by step block by block where needed.
* Function names are elegant and self-explanatory (i.e. spin_lock
tells there is spin_lock, spin_lock_shed_lock tells there will be
spin_lock and no preemption, spin_lock_irqsave tells there is
spin_lock and irqsave, spin_lock_irqsave_shed_lock tells there is
spin_lock with irqsave and no preemption).
* Call order inside is self-explanatory.
* There is zero hidden assumed default behavior.
* This should cover all necessary scenarios.
* This enables Linux compatible kernel API where necessary (linux call
names just wrap and call our calls).
* It will be easier to write and understand documentation.
* It will be easier to write and understand examples.
* It will be easy to provide examples of bad use.
* It will be easy to catch bad commits right away.
* And last but not least it may be easy to identify and wrap SMP code
with ifdefs and simply have no SMP code on no-SMP hardware?

What do you think? :-)

Tomek

-- 
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

Re: The behavior of spin_lock needs everyone's advice

Reply via email to