tmedicci commented on PR #10496:
URL: https://github.com/apache/nuttx/pull/10496#issuecomment-1762851740

   > You can see that the problem is caused by mixed using `up_irq_save` and 
`enter_critical_section`. Is this a permitted pattern?
   > 
   > * if it is, then this PR is needed to fix up.
   > * if it isn't, the this PR is not needed. We must use only 
`enter_critical_section`, instead of `up_irq_save` in smp case?
   
   Just adding some information on this topic:
   
   I've found a similar issue with ESP32-S3 + SMP in a WIP development 
regarding a function that performs an SPI flash operation in 
[`esp32s3_phy_enable`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_wireless.c#L455)
   
   Basically, this function will call (WIP) another function to read data from 
SPI flash. This operation requires disabling the non-IRAM interrupts on both 
cores in 
[`spiflash_start`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_spiflash.c#L261).
 This function [posts to a 
semaphore](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_spiflash.c#L289)
 to make a pinned-to-the-other-core task to make it run a busy loop while the 
operation takes place.  
[`nxsem_post`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/semaphore/sem_post.c#L70)
 calls 
[`nxsched_add_readytorun`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/sched/sched_addreadytorun.c#L152)
 which will [pause the other 
CPU](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/sched/sched_addreadytorun.c#
 L260) to set up the `g_assignedtasks` on the other core and add the task that 
performs the busy loop.
   
   This operation causes a deadlock because of the critical section of 
[`esp32s3_phy_enable`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_wireless.c#L468).
 The CPU pause will hang 
[here](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/common/xtensa_cpupause.c#L196):
 not because of the CPU pause operation (the other CPU was already resumed),  
but because this `enter_critical_section` will keep looping in 
[`irq_waitlock`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/irq/irq_csection.c#L123)
 because it can't acquire the spinlock (owned by `esp32s3_phy_enable`).
   
   This situation is avoided by not entering a critical section in  
[`esp32s3_phy_enable`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/arch/xtensa/src/esp32s3/esp32s3_wireless.c#L468).
 Using `spin_lock_irqsave(NULL)` fixes the issue.
   
   Please note that 
[`spin_lock_irqsave`](https://github.com/apache/nuttx/blob/1989749850eff1b57bf543983ddffbdc8307d071/sched/irq/irq_spinlock.c#L85)
 calls `up_irq_save`, so I'd say that calling it is not an issue to perform a 
CPU pause operation, the problem is performing this operation in a nested 
critical section (called by `enter_critical_section`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to