Hello Xiaomi dev team,
If I understand correctly spinlocks are basically a no-op when no SMP is
involved, which is the majority of CPUs supported by NuttX. You shall
not break all platforms.
I urge you all to revert any commit that might have broken something in
this domain, redesign it carefully on your own code fork, and then push
a final working version to nuttx.
Even if nuttx has releases, you are basically pushing untested changes
to a main branch (in core code!) that is used by many people.
You are critically breaking nuttx for everyone.
You are a large company with financial resources, so PLEASE do your job,
install a gitea or gitlab on a raspberry pi in your company and do your
dev in your private fork.
It is too many influence to push THAT MUCH amount of breakage to anyone.
The NuttX project is mature, it has real industrial users, it is not a
playground for your funny OS hacking.
PLEASE RESPOND and ACT to save the project. This problem need to be
adressed at the root level.
Let me remind you that NuttX 12.8.0 is fully broken on my stm32f4 board
and that I cannot repair it until next week because my schedule is fully
booked on other critical project.
Our customer does not understand why adding tcp keepalive requires that
much work. We are going to loose customers because of your NuttX bad
community behaviour.
I do not want to additionnally debug your locking crap on a cpu that
does not need any of it.
PLEASE DO SOMETHING RESPONSIBLE.
Sebastien
On 05/02/2025 08:13, chao an wrote:
>It isn't an initialization problem, the real cause is some code abusing
>spin lock(lock/unlock in the different thread).
>After holding sched_lock in spinlock_irqsave, the api requires that the
>lock/unlock come from the same thread.
So this brings up a potential problem with spin_lock, right?
>All performance critical code is evaluating carefully to skip the
>sched_lock/sched_unlock, like this:
>https://github.com/apache/nuttx/pull/15695
>And the optimization of sched_lock is under heavy development.
*raw_spin_lock_irqsave()* is not equivalent to without *sched_lock()*.
On the contrary, in the Linux kernel, raw_spin_lock_irqsave() means
that preemption is disabled by default.
https://github.com/torvalds/linux/blob/master/include/linux/spinlock_api_smp.h#L104-L113
image.png
>Here is the summary why I prefer to add sched_lock in
>spin_lock/spin_lock_irqsave by default:
> - The default(short) api should be safe since not all people
understand
> the schedule behaviour deeply, please see the related discussion:
> https://github.com/apache/nuttx/issues/9531
> https://github.com/apache/nuttx/issues/1138
> https://www.kernel.org/doc/Documentation/preempt-locking.txt
> - spin_lock without sched_lock is unusable in the normal thread
context,
> since the scheduler may suspend the spinlock owner thread at any time
1. spin_lock can be applied in scenarios requiring more fine-grained
control. For example, in some cases, we can conditionally decide
whether to hold the spin_lock, rather than simply forcing all users to
use the version with sched_lock:
image.png
> - API semantic align with Linux to avoid Linux developer use the
unsafe
> api without any notification
2. NuttX is not Linux. We need to enable API users to truly understand
what happens inside the API just by looking at its name.
> - the caller of enter_critical section can call sem_post/mq_send
without
> problem, spin_lock/spin_lock_irqsave can only achieve the same
behaviour by
> holding sched lock
3. We should handle scenarios that may cause scheduling differently.
We should use *spin_lock_irqsave_nopreempt()* instead of providing
developers with a low - performance version by default.