acassis commented on PR #17468:
URL: https://github.com/apache/nuttx/pull/17468#issuecomment-3660442233
> @anchao @acassis **Patch Motivation & Proper Solution:** The main purpose
of this patch is to address use-after-free issues that can occur when a thread
unexpectedly exits. For example, in scenarios such as proc_open(), when
nxsched_get_tcb obtains the TCB but the corresponding thread exits at a higher
priority, or during calls to functions like` nxclock_gettime(),
sched_backtrace(), or mm_memdump(),` a similar risk exists. These problems
should be systematically addressed by introducing critical sections or by
employing other synchronization strategies in those critical paths.
>
> **Not Just SMP:** These use-after-free problems can occur in both SMP and
UP environments — this is not an SMP-only issue.
>
> **Limitations of Critical Sections:** Even with the addition of critical
sections, these issues may not be fully resolved. For instance, consider the
following example:
>
> ```
>
> mm_memdump()
> {
> enter_critical_section();
> tcb = nxsched_get_tcb(pid);
> syslog(); // might wake up or schedule another thread
> name = get_task_name(tcb);
> leave_critical_section();
> }
> ```
>
> In this scenario, even with enter_critical_section(), there can still be a
race if syslog() internally leads to context switches that affect the TCB’s
lifetime.
>
> **On Mainstream RTOS Practice:** The notion that "mainstream RTOS do not
incorporate this feature, as the number of threads is predetermined in most
scenarios" is not accurate. In a typical IoT system, creating and destroying
threads (pthreads) is frequent and widespread—thread counts are rarely static.
While in safety-critical automotive contexts thread counts might be fixed, such
scenarios are the exception, not the rule, for consumer and general-purpose
products.
>
> **On Performance and Size Impact:** Statements such as "it will exert an
extremely detrimental impact on both size and performance" are not
substantiated without concrete measurement or testing. According to the data
provided by @hujun260, the actual performance impact is minimal; please
carefully review the test report. Regarding code size: yes, adding additional
safety checks will increase flash usage, and we will supplement the PR with
precise flash size measurements.
>
> If there is a better alternative to solve this problem, then this patch is
unnecessary. Patches are welcome.
@GUIDINGLI thank you for bring more information to this discussion.
I think this commit will be beneficial to NuttX, but I think the data that
@hujun260 brought is not helping much: osperf results inconclusive, see:
1) why context-switch have decreased 3.36 times?
2) why hpwork decreased as well?
3) why pipe-rw had an overhead of 35% ?
4) why semwait became one 2x fast?
Seems like osperf is not reliable for this kind of test, maybe we need to
use Segger JLink RTT or OBRTrace Orbuculum to get more precise/reliable data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]