hujun260 commented on PR #17468:
URL: https://github.com/apache/nuttx/pull/17468#issuecomment-3677382071
> > @anchao @acassis **Patch Motivation & Proper Solution:** The main
purpose of this patch is to address use-after-free issues that can occur when a
thread unexpectedly exits. For example, in scenarios such as proc_open(), when
nxsched_get_tcb obtains the TCB but the corresponding thread exits at a higher
priority, or during calls to functions like` nxclock_gettime(),
sched_backtrace(), or mm_memdump(),` a similar risk exists. These problems
should be systematically addressed by introducing critical sections or by
employing other synchronization strategies in those critical paths.
> > **Not Just SMP:** These use-after-free problems can occur in both SMP
and UP environments — this is not an SMP-only issue.
> > **Limitations of Critical Sections:** Even with the addition of critical
sections, these issues may not be fully resolved. For instance, consider the
following example:
> > ```
> >
> > mm_memdump()
> > {
> > enter_critical_section();
> > tcb = nxsched_get_tcb(pid);
> > syslog(); // might wake up or schedule another thread
> > name = get_task_name(tcb);
> > leave_critical_section();
> > }
> > ```
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > In this scenario, even with enter_critical_section(), there can still be
a race if syslog() internally leads to context switches that affect the TCB’s
lifetime.
> > **On Mainstream RTOS Practice:** The notion that "mainstream RTOS do not
incorporate this feature, as the number of threads is predetermined in most
scenarios" is not accurate. In a typical IoT system, creating and destroying
threads (pthreads) is frequent and widespread—thread counts are rarely static.
While in safety-critical automotive contexts thread counts might be fixed, such
scenarios are the exception, not the rule, for consumer and general-purpose
products.
> > **On Performance and Size Impact:** Statements such as "it will exert an
extremely detrimental impact on both size and performance" are not
substantiated without concrete measurement or testing. According to the data
provided by @hujun260, the actual performance impact is minimal; please
carefully review the test report. Regarding code size: yes, adding additional
safety checks will increase flash usage, and we will supplement the PR with
precise flash size measurements.
> > If there is a better alternative to solve this problem, then this patch
is unnecessary. Patches are welcome.
>
> You can try running this code. The nxsched_get_tcb() function incurs a
2.5~4% performance overhead, and this function is used throughout the entire
nuttx:
>
> ```
> #include <nuttx/config.h>
> #include <stdio.h>
> #include <pthread.h>
>
>
/****************************************************************************
> * Public Functions
>
****************************************************************************/
>
>
/****************************************************************************
> * hello_main
>
****************************************************************************/
> pthread_mutex_t g_mutex = PTHREAD_MUTEX_INITIALIZER;
>
> static void timespec_diff(const struct timespec *start, const struct
timespec *end, struct timespec *diff) {
> diff->tv_sec = end->tv_sec - start->tv_sec;
> diff->tv_nsec = end->tv_nsec - start->tv_nsec;
>
> if (diff->tv_nsec < 0) {
> diff->tv_sec--;
> diff->tv_nsec += 1000000000;
> }
> }
>
> static void timespec_add(struct timespec *total, const struct timespec
*diff) {
> total->tv_sec += diff->tv_sec;
> total->tv_nsec += diff->tv_nsec;
>
> if (total->tv_nsec >= 1000000000) {
> total->tv_sec += total->tv_nsec / 1000000000;
> total->tv_nsec = total->tv_nsec % 1000000000;
> }
> }
>
> static void timespec_avg(const struct timespec *total, int count, struct
timespec *avg) {
> uint64_t total_ns = (uint64_t)total->tv_sec * 1000000000 +
total->tv_nsec;
> uint64_t avg_ns = total_ns / count;
>
> avg->tv_sec = avg_ns / 1000000000;
> avg->tv_nsec = avg_ns % 1000000000;
> }
>
> int main(int argc, char *argv[])
> {
> struct timespec start;
> struct timespec end;
> struct timespec diff;
> struct timespec total = {0, 0};
> struct timespec avg;
> int i, j = 0;
> const int loop_count = 10;
>
> while (j < loop_count)
> {
> i = 0;
> j++;
>
> clock_gettime(CLOCK_BOOTTIME, &start);
> pthread_mutex_lock(&g_mutex);
> while (i < 1000 * 1000)
> {
> i++;
> pthread_mutex_trylock(&g_mutex);
> }
> pthread_mutex_unlock(&g_mutex);
>
> clock_gettime(CLOCK_BOOTTIME, &end);
>
> timespec_diff(&start, &end, &diff);
>
> timespec_add(&total, &diff);
>
> timespec_avg(&total, j, &avg);
>
> printf("%d: diff = %lu.%09lu s | avg = %lu.%09lu s\n",
> j,
> (unsigned long)diff.tv_sec, (unsigned long)diff.tv_nsec,
> (unsigned long)avg.tv_sec, (unsigned long)avg.tv_nsec);
> }
>
> printf("\n===== result =====\n");
> printf("count: %d\n", loop_count);
> printf("total: %lu.%09lu s\n", (unsigned long)total.tv_sec, (unsigned
long)total.tv_nsec);
> printf("avg: %lu.%09lu s\n", (unsigned long)avg.tv_sec, (unsigned
long)avg.tv_nsec);
>
> return 0;
> }
> ```
>
> Additionally, I conducted the validation on the sim host PC; running this
test on an MCU may result in even greater performance degradation:
>
> Before this change:
>
> ```
> nsh> hello
> 1: diff = 0.306540108 s | avg = 0.306540108 s
> 2: diff = 0.297250552 s | avg = 0.301895330 s
> 3: diff = 0.296764389 s | avg = 0.300185016 s
> 4: diff = 0.297345113 s | avg = 0.299475040 s
> 5: diff = 0.296678248 s | avg = 0.298915682 s
> 6: diff = 0.296436913 s | avg = 0.298502553 s
> 7: diff = 0.296414429 s | avg = 0.298204250 s
> 8: diff = 0.296148430 s | avg = 0.297947272 s
> 9: diff = 0.296353964 s | avg = 0.297770238 s
> 10: diff = 0.296554913 s | avg = 0.297648705 s
>
> ===== result =====
> count: 10
> total: 2.976487059 s
> avg: 0.297648705 s
> ```
>
> After this change:
>
> ```
> nsh> hello
> 1: diff = 0.322894500 s | avg = 0.322894500 s
> 2: diff = 0.304714785 s | avg = 0.313804642 s
> 3: diff = 0.303939787 s | avg = 0.310516357 s
> 4: diff = 0.304068904 s | avg = 0.308904494 s
> 5: diff = 0.304900920 s | avg = 0.308103779 s
> 6: diff = 0.304780429 s | avg = 0.307549887 s
> 7: diff = 0.304122773 s | avg = 0.307060299 s
> 8: diff = 0.304266724 s | avg = 0.306711102 s
> 9: diff = 0.304566429 s | avg = 0.306472805 s
> 10: diff = 0.303873630 s | avg = 0.306212888 s
>
> ===== result =====
> count: 10
> total: 3.062128881 s
> avg: 0.306212888 s
> ```
I've added the nxsched_get_tcb_noref interface—its performance is the same
as before—and I've also reimplemented nxsched_verify_pid.
Furthermore, I retested the scenario you mentioned, and there's no
performance degradation at all.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]