jlaitine opened a new issue, #16430:
URL: https://github.com/apache/nuttx/issues/16430

   ### Description / Steps to reproduce the issue
   
   **Problem statement**
   
   This is an old bug/vulnerability, which has been in NuttX for ages. signal 
delivery is done from several contexts, and in some cases the only way to 
execute the signal properly is to keep the critical section.
   
   For example, for risc-v there is code like this:
   
   
https://github.com/apache/nuttx/blob/f4cc2c262853bce7321bf82661a93426eedfd854/arch/risc-v/src/common/riscv_schedulesigaction.c#L93
   
   Similar code exists for all platforms, some platforms even have more 
branches to deal with different situations.
   
   User wouldn't know that sometims (randomly), the interrupts and scheduling 
are disabled for the whole duration of the signal action handler.
   
   **Suggested resolution**
   
   All of this is in principle useless. Signals should be executed only from 
one place/context for every platform. The correct place is after the context 
switch, before returning to the user code. And in order to keep the real-time 
behaviour, there should be a way to directly trigger the context switch at the 
time of signal dispatch.
   
   Also in the SMP, it is not necessary to force the delivery of the signal by 
locking the other CPU, and scheduling the signal action via an SMP call. It 
should be enough just to add the action to the pending action queue, and 
trigger a re-schedule on that CPU. If the task/thread happens to migrate to 
other CPU before the triggered re-schedule happens, it just doesn't matter. The 
signal action is anyhow executed when the thread is scheduled to run next time 
on the other CPU.
   
   Some time ago, I made a quick hack to test an alternative way to schedule 
signal actions for risc-v, removing the critical section:
   
   
https://github.com/apache/nuttx/blob/f4cc2c262853bce7321bf82661a93426eedfd854/arch/risc-v/src/common/riscv_schedulesigaction.c#L93
   
   Which works fine in flat mode, on platforms where context switch is done via 
the IRQ. But obviously this is not a generic solution - all platforms don't 
enter the irq_dispatch() at context switch.
   
   So, in order to fix this one would need to identify:
   
   - What would be the correct place in code to schedule the signal action for 
each platform (in the context switch path). There should be just one place and 
one context to call the "up_schedule_sigaction"
   - What is the correct way to trigger a context switch from within 
"nxsig_queue_action"; for single CPU case "up_switch_context" is probably the 
easiest one. In SMP case the cleanest solution is probably using some dedicated 
syscall, common to all platforms, requesting re-schedule. This can be triggered 
via an IPI in the same way as the "smp_call" is done.
   - Clean up the "up_schedule_sigaction" for each platform, removing the extra 
branches.
   
   I started looking into this some time ago, but never found time to continue, 
so I decided to just write this issue - so that anyone interested can continue 
on this.
   
   
   ### On which OS does this issue occur?
   
   [OS: Linux]
   
   ### What is the version of your OS?
   
   22.04.1-Ubuntu
   
   ### NuttX Version
   
   master
   
   ### Issue Architecture
   
   [Arch: all]
   
   ### Issue Area
   
   [Area: Kernel]
   
   ### Host information
   
   _No response_
   
   ### Verification
   
   - [x] I have verified before submitting the report.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to