wyr-7 commented on PR #18210: URL: https://github.com/apache/nuttx/pull/18210#issuecomment-3822130045
> Please provide test logs for these cases: > > > ``` > > Verified on QEMU ARMv7-A simulator with multimedia profile > > Reader-writer semaphore operations verified: > > Single reader access patterns > > Single writer access patterns > > Mixed reader-writer contention scenarios > > Waiter notification behavior verified > > No unnecessary context switches on optimized paths > > Static analysis shows improved compliance metrics > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Testing > > Reader-writer semaphore scenarios: > > Multiple readers with no writers > > Multiple writers (exclusive access) > > Mixed reader-writer access patterns > > Waiter wake-up correctness > > Lock holder tracking > > Context switch reduction verification > > > Performance: Reduced context switch overhead in high-contention scenarios > > How did you verify this performance improvement? Can you please share the scenario and the results? Thank you for the review. This optimization has been validated in Vela OS (Xiaomi's embedded operating system based on NuttX) and is running in production across multiple product lines including wearable devices, IoT devices, and automotive systems. The optimization addresses two inefficiencies: up_write(): Original code calls up_wait() unconditionally, even during partial release of recursive locks (when writer > 0). This causes unnecessary semaphore posts when no waiter can actually acquire the lock. up_read(): Original code checks waiter > 0 instead of reader > 0, potentially calling up_wait() while other readers still hold the lock. Write waiters wake up, find the lock unavailable, and go back to sleep. The fix ensures up_wait() is only called when the lock is actually available for acquisition - when writer reaches 0 or when the last reader releases. This has been running in Vela OS production for several months on ARM Cortex-A SMP platforms. We observed: Reduced scheduler overhead in recursive locking scenarios (VFS layer, graphics subsystem) Improved responsiveness under high reader concurrency No regressions in extensive stress testing and long-running stability tests The issue was identified through production profiling showing unnecessary nxsem_post() calls and spurious wake-ups in rwsem release paths. Correctness The optimization preserves all semantics - waiters are still woken at the correct time, all operations remain mutex-protected, and it passes all existing tests. It simply eliminates redundant wake-up operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
