sumpfralle opened a new pull request, #17224:
URL: https://github.com/apache/nuttx/pull/17224

   ## Summary
   
   Previously the function `up_cpu_pause` was used during flash write 
operations for preventing all other CPUs from executing code from flash.
   The above function was removed in d8cb7759.
   Thus, writing to flash on `rp2040` was only possible in non-SMP mode.
   
   This PR allows write operations to the flash on rp2040 even in SMP mode. It 
blocks all but the current CPU for the duration of the critical function (write 
or erase).
   
   The implementation is based on the previous implementation (see d8cb7759).
   It is slightly enhanced by supporting more than two CPUs and it avoids 
global variables.
   
   Closes: #16203
   
   *BEWARE*: this is just a draft. Please take a look at the "discussion" at 
the end.
   
   ## Impact
   
   It is possible to write to flash on rp2040 even in SMP mode.
   
   ## Testing
   
   1. enable `CONFIG_PIPES` and `CONFIG_RP2040_FLASH_FILE_SYSTEM`
   1. apply the following patch (disables "smartfs", which would block the mtd):
       ```patch
       --- a/boards/arm/rp2040/common/src/rp2040_common_bringup.c
       +++ b/boards/arm/rp2040/common/src/rp2040_common_bringup.c
       @@ -726,7 +726,7 @@ int rp2040_common_bringup(void)
            }
        #endif
   
       -#ifdef CONFIG_RP2040_FLASH_FILE_SYSTEM
       +#ifdef disabled_CONFIG_RP2040_FLASH_FILE_SYSTEM
   
          mtd_dev = rp2040_flash_mtd_initialize();
       ```
   1. run `rp2040_flash_mtd_initialize();` in your startup code
   1. run in the NuttX shell: `netcat -l 10000 | dd of=/dev/mtd bs=1k seek=512`
   1. run on your host: `nc -N "$nuttx_host" 10000 <filename`
   1. watch the `dd` process in the NuttX shell to finish after the connection 
of `netcat` was closed
   
   Without the patch, this process would hang (after commenting out the removed 
`up_cpu_pause` calls).
   
   Running the `smp` app (`CONFIG_TESTING_SMP`) during the above write 
operation does not cause issues.
   
   ## Details to be discussed?
   1. The proposed implementation of the "pause all other CPUs" feature is 
generic enough to be helpful for other architectures, as well. E.g. the 
`esp32_spiram.c` and `esp32s3_spiram.c` use a similar implementation for 
guarding their flash accesses (see `g_cpu_pause`). `rp2350` will surely needs 
this, too, when flash write support is added.
       * Should I move this set of function (`*_smp_isolation`) somewhere else 
(e.g. below `sched/sched/`)? Which name would be appropriate? 
(`*_smp_isolation` is just my silly choice)
       * Should I propose a separate pull request with the respective changes 
for esp32 in order to allow others to test them? (I do not have such a board)
       * Should I revive `sched_note_cpu_pause`? Currently it is probably 
unused.
   1. The implementation uses "spin_lock_t" (instead of `bool` flags). I picked 
this for readability. Is this a burden? Should I use `bool` instead?
   1. The implementation moves the "enter_critical_section" call closer to the 
flash write operation (inside the "SMP isolation" calls). Previously 
`enter_critical_section` was called *before* `up_cpu_pause`. Now it is called 
afterwards.
       * I am not sure, *why* this is necessary, but the original call location 
leads to random hangs after writing around 50 kB.
   1. The functions `enter_smp_isolation` and `leave_smp_isolation` call 
`sched_lock`/`sched_unlock`. This may feel a bit excessive, but I think, it is 
necessary:
       - a) `nxsched_smp_call_async` tries to run the "pause handler" directly 
on the current CPU, if `this_cpu()` is part of the given cpuset (see 
[source](https://github.com/apache/nuttx/blob/master/sched/sched/sched_smp.c#L317)).
 This would starve our current task and prevent the execution of 
"leave_smp_isolation". This behavior of "nxsched_smp_call_async" is probably a 
bug.
       - b) We should not allow other SMP-aware functions to run, because they 
may rely on SMP being enabled (causing deadlocks?). This is possible, since 
`CONFIG_SMP` is used everywhere as the indicator for "multiple processes 
running in parallel". This is not true during the "smp isolation" context 
created by these functions.
   1. At the moment, the blocked CPUs may receive and handle interrupts. At 
least I do not see anything preventing this. Maybe we should run 
"enter_critical_section" for each busy-locked CPU in order to avoid this?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to