This is an automated email from the ASF dual-hosted git repository. jerpelea pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-nuttx.git
The following commit(s) were added to refs/heads/master by this push: new 6ec9408 sched: irq: Fix enter_critical_section() in an irq handler for SMP 6ec9408 is described below commit 6ec94082a10a914acc50330ab40f63a425fa67d0 Author: Masayuki Ishikawa <masayuki.ishik...@gmail.com> AuthorDate: Mon Nov 2 17:06:21 2020 +0900 sched: irq: Fix enter_critical_section() in an irq handler for SMP Summary: - I found a deadlock during Wi-Fi audio streaming test plus stress test - The testing environment was spresense:wifi_smp (NCPUS=4) - The deadlock happened because two CPUs called up_cpu_pause() almost simultaneously - This situation should not happen, because up_cpu_pause() is called in a critical section - Actually, the latter call was from nxsem_post() in an IRQ handler - And when enter_critical_section() was called, irq_waitlock() detected a deadlock - Then it called up_cpu_paused() to break the deadlock - However, this resulted in setting g_cpu_irqset on the CPU - Even though another CPU had held a g_cpu_irqlock - This situation violates the critical section and should be avoided - To avoid the situation, if a CPU sets g_cpu_irqset after calling up_cpu_paused() - The CPU must release g_cpu_irqlock first - Then retry irq_waitlock() to acquire g_cpu_irqlock Impact: - Affect SMP Testing: - Tested with spresense:wifi_smp (NCPUS=2 and 4) - Tested with spresense:smp - Tested with sim:smp - Tested with sabre-6quad:smp (QEMU) - Tested with maix-bit:smp (QEMU) - Tested with esp32-core:smp (QEMU) - Tested with lc823450-xgevk:rndis Signed-off-by: Masayuki Ishikawa <masayuki.ishik...@jp.sony.com> --- sched/irq/irq_csection.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/sched/irq/irq_csection.c b/sched/irq/irq_csection.c index 6993641..9eeb4b4 100644 --- a/sched/irq/irq_csection.c +++ b/sched/irq/irq_csection.c @@ -265,6 +265,7 @@ try_again: * no longer blocked by the critical section). */ +try_again_in_irq: if (!irq_waitlock(cpu)) { /* We are in a deadlock condition due to a pending @@ -273,6 +274,24 @@ try_again: */ DEBUGVERIFY(up_cpu_paused(cpu)); + + /* NOTE: As the result of up_cpu_paused(cpu), this CPU + * might set g_cpu_irqset in nxsched_resume_scheduler() + * However, another CPU might hold g_cpu_irqlock. + * To avoid this situation, releae g_cpu_irqlock first. + */ + + if ((g_cpu_irqset & (1 << cpu)) != 0) + { + spin_clrbit(&g_cpu_irqset, cpu, &g_cpu_irqsetlock, + &g_cpu_irqlock); + } + + /* NOTE: Here, this CPU does not hold g_cpu_irqlock, + * so call irq_waitlock(cpu) to acquire g_cpu_irqlock. + */ + + goto try_again_in_irq; } }