linguini1 commented on PR #17224:
URL: https://github.com/apache/nuttx/pull/17224#issuecomment-3429959657

   > ## Summary
   > 
   > Previously the function `up_cpu_pause` was used during flash write 
operations for preventing all other CPUs from executing code from flash. The 
above function was removed in 
[d8cb775](https://github.com/apache/nuttx/commit/d8cb7759b657cf92dfb66aa8cf8c4ff6391a4d40).
 Thus, writing to flash on `rp2040` was only possible in non-SMP mode.
   > 
   > This PR allows write operations to the flash on rp2040 even in SMP mode. 
It blocks all but the current CPU for the duration of the critical function 
(write or erase).
   > 
   > The implementation is based on the previous implementation (see 
[d8cb775](https://github.com/apache/nuttx/commit/d8cb7759b657cf92dfb66aa8cf8c4ff6391a4d40)).
 It is slightly enhanced by supporting more than two CPUs and it avoids global 
variables.
   
   Thank you for the detailed test information! It's always appreciated a lot.
   
   Just to wrap my head around this, did d8cb7759 break the RP2040 code for SMP 
flash writing entirely? As in, it worked before this commit and then began to 
fail after that commit (I assume you found this with `git bisect`)? If so, that 
is a huge issue with the way that commit was tested before merge, we might want 
to see if there are other cases where things broke.
   
   > ## Details to be discussed?
   > 
   >     1. The proposed implementation of the "pause all other CPUs" feature 
is generic enough to be helpful for other architectures, as well. E.g. the 
`esp32_spiram.c` and `esp32s3_spiram.c` use a similar implementation for 
guarding their flash accesses (see `g_cpu_pause`). `rp2350` will surely needs 
this, too, when flash write support is added.
   >        
   >        * Should I move this set of function (`*_smp_isolation`) somewhere 
else (e.g. below `sched/sched/`)? Which name would be appropriate? 
(`*_smp_isolation` is just my silly choice)
   
   I think that's not a bad idea if it's generally applicable. I think it would 
be fine to merge this fix first since it solves (presumably) a regression, and 
then afterward you could see about re-factoring the solution to use your more 
general implementation?
   
   >        * Should I propose a separate pull request with the respective 
changes for esp32 in order to allow others to test them? (I do not have such a 
board)
   
   Yes, there are some people working on Espressif boards and can probably run 
your changes against their internal CI to catch anything. I would like to note 
that it's very appreciated that you mention this, since there are a lot of PRs 
which only test one board and assume the changes apply just as well elsewhere.
   
   >        * Should I revive `sched_note_cpu_pause`? Currently it is probably 
unused.
   
   What I'm wondering is, do the `up_cpu_pause/resume` and etc. functions 
removed in the linked commit not achieve the same task that you'd achieve by 
creating your general SMP isolation functions? Because then you could just 
revive the old implementations. I haven't looked thoroughly, so if your new 
implementation would be more streamlined (you mentioned no globals) then just 
go with that. I'm not familiar with the semantics of `sched_note_cpu_pause` vs 
`up_cpu_pause`, etc.
    
   >     2. The implementation uses "spin_lock_t" (instead of `bool` flags). I 
picked this for readability. Is this a burden? Should I use `bool` instead?
   > 
   >     3. The implementation moves the "enter_critical_section" call closer 
to the flash write operation (inside the "SMP isolation" calls). Previously 
`enter_critical_section` was called _before_ `up_cpu_pause`. Now it is called 
afterwards.
   >        
   >        * I am not sure, _why_ this is necessary, but the original call 
location leads to random hangs after writing around 50 kB.
   > 
   >     4. The functions `enter_smp_isolation` and `leave_smp_isolation` call 
`sched_lock`/`sched_unlock`. This may feel a bit excessive, but I think, it is 
necessary:
   >    
   
   Can't really help with any of this, sorry. I'm not familiar enough with much 
of the scheduling logic to say anything of value.
       
   >        * a) `nxsched_smp_call_async` tries to run the "pause handler" 
directly on the current CPU, if `this_cpu()` is part of the given cpuset (see 
[source](https://github.com/apache/nuttx/blob/master/sched/sched/sched_smp.c#L317)).
 This would starve our current task and prevent the execution of 
"leave_smp_isolation". This behavior of "nxsched_smp_call_async" is probably a 
bug.
   
   Maybe you could raise an issue for it and post to the mailing list to get 
more eyes on this?
   
   >        * b) We should not allow other SMP-aware functions to run, because 
they may rely on SMP being enabled (causing deadlocks?). This is possible, 
since `CONFIG_SMP` is used everywhere as the indicator for "multiple processes 
running in parallel". This is not true during the "smp isolation" context 
created by these functions.
   > 
   >     5. At the moment, the blocked CPUs may receive and handle interrupts. 
At least I do not see anything preventing this. Maybe we should run 
"enter_critical_section" for each busy-locked CPU in order to avoid this?
   
   Ditto, not informed enough to help with this either.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to