tmedicci commented on PR #10496: URL: https://github.com/apache/nuttx/pull/10496#issuecomment-1762851740
> You can see that the problem is caused by mixed using `up_irq_save` and `enter_critical_section`. Is this a permitted pattern? > > * if it is, then this PR is needed to fix up. > * if it isn't, the this PR is not needed. We must use only `enter_critical_section`, instead of `up_irq_save` in smp case? Just adding some information on this topic: I've found a similar issue with ESP32-S3 + SMP in a WIP development regarding a function that performs an SPI flash operation in [`esp32s3_phy_enable`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_wireless.c#L455) Basically, this function will call (WIP) another function to read data from SPI flash. This operation requires disabling the non-IRAM interrupts on both cores in [`spiflash_start`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_spiflash.c#L261). This function [posts to a semaphore](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_spiflash.c#L289) to make a pinned-to-the-other-core task to make it run a busy loop while the operation takes place. [`nxsem_post`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/semaphore/sem_post.c#L70) calls [`nxsched_add_readytorun`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/sched/sched_addreadytorun.c#L152) which will [pause the other CPU](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/sched/sched_addreadytorun.c# L260) to set up the `g_assignedtasks` on the other core and add the task that performs the busy loop. This operation causes a deadlock because of the critical section of [`esp32s3_phy_enable`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_wireless.c#L468). The CPU pause will hang [here](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/common/xtensa_cpupause.c#L196): not because of the CPU pause operation (the other CPU was already resumed), but because this `enter_critical_section` will keep looping in [`irq_waitlock`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/irq/irq_csection.c#L123) because it can't acquire the spinlock (owned by `esp32s3_phy_enable`). This situation is avoided by not entering a critical section in [`esp32s3_phy_enable`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_wireless.c#L468). Using `spin_lock_irqsave(NULL)` fixes the issue. Please note that [`spin_lock_irqsave`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/irq/irq_spinlock.c#L85) calls `up_irq_save`, so I'd say that calling it is not an issue to perform a CPU pause operation, the problem is performing this operation in a nested critical section (called by `enter_critical_section`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org