hartmannathan commented on PR #17468:
URL: https://github.com/apache/nuttx/pull/17468#issuecomment-3660536525

   > @anchao @acassis 
   > 
   > **Patch Motivation & Proper Solution:**
   > 
   > The main purpose of this patch is to address use-after-free issues that 
can occur when a thread unexpectedly exits. For example, in scenarios such as 
proc_open(), when nxsched_get_tcb obtains the TCB but the corresponding thread 
exits at a higher priority, or during calls to functions like` 
nxclock_gettime(), sched_backtrace(), or mm_memdump(),` a similar risk exists. 
These problems should be systematically addressed by introducing critical 
sections or by employing other synchronization strategies in those critical 
paths.
   > 
   > 
   > 
   > **Not Just SMP:**
   > 
   > These use-after-free problems can occur in both SMP and UP environments — 
this is not an SMP-only issue.
   > 
   > 
   > 
   > **Limitations of Critical Sections:**
   > 
   > Even with the addition of critical sections, these issues may not be fully 
resolved. For instance, consider the following example:
   > 
   > 
   > 
   > ```
   > 
   > 
   > 
   > mm_memdump() 
   > 
   > {
   > 
   >     enter_critical_section();
   > 
   >     tcb = nxsched_get_tcb(pid);
   > 
   >     syslog();             // might wake up or schedule another thread
   > 
   >     name = get_task_name(tcb);
   > 
   >     leave_critical_section();
   > 
   > }
   > 
   > ```
   > 
   > 
   > 
   > In this scenario, even with enter_critical_section(), there can still be a 
race if syslog() internally leads to context switches that affect the TCB’s 
lifetime.
   > 
   > 
   > 
   > **On Mainstream RTOS Practice:**
   > 
   > The notion that "mainstream RTOS do not incorporate this feature, as the 
number of threads is predetermined in most scenarios" is not accurate. In a 
typical IoT system, creating and destroying threads (pthreads) is frequent and 
widespread—thread counts are rarely static. While in safety-critical automotive 
contexts thread counts might be fixed, such scenarios are the exception, not 
the rule, for consumer and general-purpose products.
   > 
   > 
   > 
   > **On Performance and Size Impact:**
   > 
   > Statements such as "it will exert an extremely detrimental impact on both 
size and performance" are not substantiated without concrete measurement or 
testing. According to the data provided by @hujun260, the actual performance 
impact is minimal; please carefully review the test report. Regarding code 
size: yes, adding additional safety checks will increase flash usage, and we 
will supplement the PR with precise flash size measurements.
   > 
   > 
   > 
   > If there is a better alternative to solve this problem, then this patch is 
unnecessary. Patches are welcome.
   
   @GUIDINGLI @hujun260 This explanation is extremely helpful for giving 
context and understanding this change. I recommend to please add this 
explanation to the PR description and also to the commit log, so that it can be 
easily understood in the future. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to