Thank you guys for bringing this discussion on the mailing list! We really should first discuss any breaking change on the list long enough until everyone is happy.. and we are sure not to break anything :-)
Suggestions to verify in depth on a real world hardware any subtle issue will be soon our standard requirement. For now we need to take care of this manually. Staging environment could be a good choice too. This is true, many projects, even industrial ones (I will make one soon big scale, I did already small scale) depend on NuttX API stability, that means long term self-compatibility and trust, this is as important as runtime stability, maybe even more in these applications. This is something unique to NuttX and we should really respect and nurture :-) Not many of us have this privilege to be paid for every commit, but if things get broken all of us will loose money and customers and trust. Therefore we should really avoid any breaking changes.. and focusing on alternatives instead. >From "user" point of view, if the change breaks kernel api, it should be rejected, also reverted if already applied. If this is the case right now and reverting will fix the release we should revert the change and make a patch release 12.8.1. Is this the main problem right now and will that fix 12.8.0? Then we should go for it :-) For the later part I need to quote two parts so we have a solid reference point :-) Part 1: On Wed, Feb 5, 2025 at 12:59 PM Guiding Li <ligd...@gmail.com> wrote: > *Option 1:* > spin_lock: spin lock > spin_lock_nopreempt: spin_lock + sched_lock > spin_lock_irqsave: spin lock + irqsave > spin_lock_irqsave_nopreempt: spin_lock + irq save + sched_lock > > *Option 2:* > spin_lock: spin lock + sched_lock > spin_lock_preempt: spin_lock > spin_lock_irqsave: spin lock + irq save +sched_lock > spin_lock_irqsave_preempt: spin_lock + irq save > > From the correctness level: > The Option2 is correct, because if you call sem_post()/syslog()/... within > spin_lock() will cause a system crash. > > From the performance level: > The Option2 has lower efficiency as you said in the NON-SMP case. > > Then can we have a method that takes both ? > Based on option1, we add a check if someone called sem_post()/syslog()... > then system ASSERT. Alert the people who should change their usage. > And also the performance will be considered. Part 2: On Wed, Feb 5, 2025 at 7:46 AM Xiang Xiao <xiaoxiang781...@gmail.com> wrote: > On Wed, Feb 5, 2025 at 1:06 PM chao an <magicd...@gmail.com> wrote: > > My point of view is to restore the original behavior of spin_lock() and > > spin_lock_irqsave(), which brings the following benefits: > > 1. The API semantics remain unchanged, and independent developers and > > projects that use these APIs outside the nuttx repository can keep their > > code without any adjustment > > 2. The API naming is consistent with the internal implementation, and the > > caller can know what is happening inside the function from the API naming > > > > *Option 1:* > > > > *spin_lock: spin lockspin_lock_nopreempt: > > spin_lock + sched_lockspin_lock_irqsave: spin lock + irq > > savespin_lock_irqsave_nopreempt: spin_lock + irq save + sched_lock* > > > > @xiaoxiang suggested changing the semantics of the API to hold sched_lock > > by default, consistent with Linux: > > 1. This means that all locations in the kernel that call spin_lock and > > spin_lock_irqsave need to be changed > > > Since holding sched locks make spinlock more safe, the change doesn't > require the change from the caller in general. > > > spin_lock_irqsave -> spin_lock_irqsave_preempt > > 2. It is impossible to know what happens inside the function from the API > > naming > > > > *Option 2:* > > > > *spin_lock: spin lock + sched_lockspin_lock_preempt: > > spin_lockspin_lock_irqsave: spin lock + irq save + > > sched_lockspin_lock_irqsave_preempt: spin_lock + irq save* > > > > I don't know which definition is better, or if any wise person has a better > > choice, please give some advice. > > Here is the summary why I prefer to add sched_lock in > spin_lock/spin_lock_irqsave by default: > > - The default(short) api should be safe since not all people understand > the schedule behaviour deeply, please see the related discussion: > > https://github.com/apache/nuttx/issues/9531 > https://github.com/apache/nuttx/issues/1138 > https://www.kernel.org/doc/Documentation/preempt-locking.txt > > > - spin_lock without sched_lock is unusable in the normal thread context, > since the scheduler may suspend the spinlock owner thread at any time > - API semantic align with Linux to avoid Linux developer use the unsafe > api without any notification > - the caller of enter_critical section can call sem_post/mq_send without > problem, spin_lock/spin_lock_irqsave can only achieve the same behaviour by > holding sched lock With the above examples my main concerns are: 1. Functions are not quite elegant and self-explanatory (i.e. why there are two spinlocks in spin_lockspin_lock_irqsave name? Who will understand things like spin lock + sched_lockspin_lock_preempt what does it even mean?). 2. Functions hide a "default" inside, something assumed not explicit, that is not labeled by the function name. 3. If we change the "default" hidden behavior things for sure get broken and nobody will know why. 4. We are changing existing kernel api anyways which is bad by design and I am strongly against it :-) Why not create backward-compatible variants with fine grained functionality with self-explanatory names? Users will have a choice on building blocks to choose, and the responsibility will be on the user part.. and on our part how well we document it with every possible explanation and example.. maybe even example application to keep things crystal clear :-) spin_lock = spin_lock spin_lock_shed_lock = spin_lock -> shed_lock spin_lock_irqsave = spin_lock -> irqsave spin_lock_irqsave_shed_lock = spin_lock -> irqsave -> shed_lock spin_lock_shed_lock_irqsave = spin_lock -> shed_lock -> irqsave (..etc) This way: * We maintain backward kernel api compatiblity, all old stuff works as expected. * Things that need optimization can change without breaking existing stuff, step by step block by block where needed. * Function names are elegant and self-explanatory (i.e. spin_lock tells there is spin_lock, spin_lock_shed_lock tells there will be spin_lock and no preemption, spin_lock_irqsave tells there is spin_lock and irqsave, spin_lock_irqsave_shed_lock tells there is spin_lock with irqsave and no preemption). * Call order inside is self-explanatory. * There is zero hidden assumed default behavior. * This should cover all necessary scenarios. * This enables Linux compatible kernel API where necessary (linux call names just wrap and call our calls). * It will be easier to write and understand documentation. * It will be easier to write and understand examples. * It will be easy to provide examples of bad use. * It will be easy to catch bad commits right away. * And last but not least it may be easy to identify and wrap SMP code with ifdefs and simply have no SMP code on no-SMP hardware? What do you think? :-) Tomek -- CeDeROM, SQ7MHZ, http://www.tomek.cedro.info