Fix-Point commented on PR #17675: URL: https://github.com/apache/nuttx/pull/17675#issuecomment-3701557519
@GUIDINGLI suggested conducting performance tests on non-SMP, so I performed more detailed tests on `intel64:nsh`. I actually measured the performance of the following two restart approaches: This is our current implementation (inline functions have already been expanded): ```c // hrtimer_start_absolute(&hrtimer, (hrtimer_callback_t)clock_gettick_test, NULL, INT64_MAX - 1); hrtimer.func = clock_gettick_test; hrtimer.arg = NULL; hrtimer.expired = INT64_MAX - 1; // avoid reprogramming the timer hrtimer_async_restart(&hrtimer); ``` This is the implementation recommended by @wangchdo , who claimed it could improve performance. ```c // we have already initialized hrtimer.expired with INT64_MAX - 1 and hrtimer.arg with NULL. hrtimer.func = clock_gettick_test; // Equivalent to modifying the state. hrtimer_async_restart(&hrtimer); ``` As mentioned earlier, the main time overhead of `hrtimer_start` comes from reprogramming the actual hardware timer (over 1000 CPU cycles). To avoid this overhead, we inserted a timer set to trigger in 200 years for this test case. **Both timer insertions do not actually program the hardware timer**. The test results showed that: **the average overhead for both implementation tests running 1 million times is 48 CPU cycles (~23 ns).** This demonstrates that **CPU pipelining can really hide the extra read/write overhead**. Therefore, @wangchdo claimed that his API design can improve restart performance lacks evidence. Additionally, I measured that the time consumed by enabling/disabling interrupts (`up_irq_save`/`up_irq_restore`) is 35 cycles. In other words, excluding synchronization, the average overhead for restart is only about 13 cycles (queue insertion), while the overhead for `cancel + restart` (including the synchronization), when no hardware timer reprogramming is involved, is only 73 cycles (~35 ns). Compared to @wangchdo implementation in non-SMP mode, without introducing reprogramming, the main time overhead for both implementations lies in queue insertion and deletion. Since both use the same RB-Tree implementation, their overhead is identical. However, given that restart in this implementation has fewer conditional branches, its actual runtime performance should theoretically be slightly better than @wangchdo 's implementation, as it involves fewer branch predictions and branch misprediction rollbacks. At the same time, we don't need to add `state` or `reference count` to hrtimer, and its memory overhead is also lower compared to its implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
