Re: [PR] add a reference count to the TCB to prevent it from being deleted. [nuttx]

via GitHub Tue, 16 Dec 2025 05:08:35 -0800


acassis commented on PR #17468:
URL: https://github.com/apache/nuttx/pull/17468#issuecomment-3660442233


   > @anchao @acassis **Patch Motivation & Proper Solution:** The main purpose 
of this patch is to address use-after-free issues that can occur when a thread 
unexpectedly exits. For example, in scenarios such as proc_open(), when 
nxsched_get_tcb obtains the TCB but the corresponding thread exits at a higher 
priority, or during calls to functions like` nxclock_gettime(), 
sched_backtrace(), or mm_memdump(),` a similar risk exists. These problems 
should be systematically addressed by introducing critical sections or by 
employing other synchronization strategies in those critical paths.
   > 
   > **Not Just SMP:** These use-after-free problems can occur in both SMP and 
UP environments — this is not an SMP-only issue.
   > 
   > **Limitations of Critical Sections:** Even with the addition of critical 
sections, these issues may not be fully resolved. For instance, consider the 
following example:
   > 
   > ```
   > 
   > mm_memdump() 
   > {
   >     enter_critical_section();
   >     tcb = nxsched_get_tcb(pid);
   >     syslog();             // might wake up or schedule another thread
   >     name = get_task_name(tcb);
   >     leave_critical_section();
   > }
   > ```
   > 
   > In this scenario, even with enter_critical_section(), there can still be a 
race if syslog() internally leads to context switches that affect the TCB’s 
lifetime.
   > 
   > **On Mainstream RTOS Practice:** The notion that "mainstream RTOS do not 
incorporate this feature, as the number of threads is predetermined in most 
scenarios" is not accurate. In a typical IoT system, creating and destroying 
threads (pthreads) is frequent and widespread—thread counts are rarely static. 
While in safety-critical automotive contexts thread counts might be fixed, such 
scenarios are the exception, not the rule, for consumer and general-purpose 
products.
   > 
   > **On Performance and Size Impact:** Statements such as "it will exert an 
extremely detrimental impact on both size and performance" are not 
substantiated without concrete measurement or testing. According to the data 
provided by @hujun260, the actual performance impact is minimal; please 
carefully review the test report. Regarding code size: yes, adding additional 
safety checks will increase flash usage, and we will supplement the PR with 
precise flash size measurements.
   > 
   > If there is a better alternative to solve this problem, then this patch is 
unnecessary. Patches are welcome.
   
   @GUIDINGLI thank you for bring more information to this discussion.
   I think this commit will be beneficial to NuttX, but I think the data that 
@hujun260 brought is not helping much: osperf results inconclusive, see:
   1) why context-switch have decreased 3.36 times?
   2) why hpwork decreased as well?
   3) why pipe-rw had an overhead of 35% ?
   4) why semwait became one 2x fast?
   
   Seems like osperf is not reliable for this kind of test, maybe we need to 
use Segger JLink RTT or OBRTrace Orbuculum to get more precise/reliable data. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] add a reference count to the TCB to prevent it from being deleted. [nuttx]

Reply via email to