tmedicci commented on PR #16673: URL: https://github.com/apache/nuttx/pull/16673#issuecomment-3188301312
@jlaitine Just tested it and we still have the same issue. Let me try to explain it further. Currently, we are testing [this](https://github.com/tmedicci/nuttx/tree/fix_smp_scheduling) branch: It's basically your branch + the suggested change about the spiflash driver tasks (https://github.com/tmedicci/nuttx/commit/95150eae198742196a63b04c18a5ffd3c204668f). It still fails to run the same example I reported yesterday (here: https://github.com/apache/nuttx/pull/16673#pullrequestreview-3117567657). The device initializes, but the `nsh>` freezes. By checking at the debugger, I could see that only one of the `spi_flash_op_block_task` ran in the first core. The same branch didn't run at all in the second core. Interestingly, by halting the device after typing anything, I can check the following values for `g_readytorun` and `g_assignedtasks`: ``` +p g_readytorun $1 = {head = 0x3fc92578, tail = 0x3fc911a8} +p g_assignedtasks $2 = {0x3fc8ae00 <g_idletcb>, 0x3fc8aed4 <g_idletcb+212>} ``` Please note that both cores are running the idle thread (`g_assignedtasks`), but we have two pending tasks at `g_readytorun` and, curiously, the head TCB's (`0x3fc92578`) refers to the `spi_flash_op_block_task` task that should have run in the second core. The other task at tail (`0x3fc911a8`) refers to the UART handler that should process the typed character and echoes it back to nsh. > I didn't quite understand what you mean with inner/outer loops etc; so maybe share the modified spiflash driver code to help me understand the issue better? I referred to the loop that creates both `spi_flash_op_block_task` [here](https://github.com/tmedicci/incubator-nuttx/blob/95150eae198742196a63b04c18a5ffd3c204668f/arch/xtensa/src/esp32s3/esp32s3_spiflash.c#L1465). Please check that the scheduler is locked before the loop starts (and unlocked when it finishes). Also, please note that - as suggested - we protected the thread's creation/affinity within the critical section at [`spiflash_init_spi_flash_op_block_task`](https://github.com/tmedicci/incubator-nuttx/blob/95150eae198742196a63b04c18a5ffd3c204668f/arch/xtensa/src/esp32s3/esp32s3_spiflash.c#L1013). In order to test it, I applied the following patch: ``` diff --git a/arch/xtensa/src/esp32s3/esp32s3_spiflash.c b/arch/xtensa/src/esp32s3/esp32s3_spiflash.c index b34cead684..f83066674a 100644 --- a/arch/xtensa/src/esp32s3/esp32s3_spiflash.c +++ b/arch/xtensa/src/esp32s3/esp32s3_spiflash.c @@ -1004,14 +1004,11 @@ static int spiflash_init_spi_flash_op_block_task(int cpu) char *argv[2]; char arg1[32]; cpu_set_t cpuset; - irqstate_t flags; snprintf(arg1, sizeof(arg1), "%p", &cpu); argv[0] = arg1; argv[1] = NULL; - flags = enter_critical_section(); - pid = kthread_create("spiflash_op", SCHED_PRIORITY_MAX, CONFIG_ESP32S3_SPIFLASH_OP_TASK_STACKSIZE, @@ -1031,8 +1028,6 @@ static int spiflash_init_spi_flash_op_block_task(int cpu) ret = -EPERM; } - leave_critical_section(flags); - return ret; } #endif /* CONFIG_SMP */ @@ -1460,7 +1455,7 @@ int esp32s3_spiflash_init(void) nxrmutex_init(&g_flash_op_mutex); #ifdef CONFIG_SMP - sched_lock(); + irqstate_t flags = enter_critical_section(); for (cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++) { @@ -1473,7 +1468,7 @@ int esp32s3_spiflash_init(void) } } - sched_unlock(); + leave_critical_section(flags); #else UNUSED(cpu); #endif ``` Just removed the critical section from `spiflash_init_spi_flash_op_block_task` and protected the whole `spiflash_init_spi_flash_op_block_task` block outside the loop, by replacing the `sched_lock/unlock` for the critical section. **That worked!** So, in summary, although this last patch made it work, it still seems to have something wrong with `sched_lock/unlock` because - without this last patch - the `spi_flash_op_block_task` for the second core remains forever on `g_readytorun` and prevents other tasks on that list from running too... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org