Fix-Point commented on PR #17675:
URL: https://github.com/apache/nuttx/pull/17675#issuecomment-3701557519

   @GUIDINGLI  suggested conducting performance tests on non-SMP, so I 
performed more detailed tests on `intel64:nsh`.  
   
   I actually measured the performance of the following two restart approaches: 
 
   
   This is our current implementation (inline functions have already been 
expanded):  
   ```c
   // hrtimer_start_absolute(&hrtimer, (hrtimer_callback_t)clock_gettick_test, 
NULL, INT64_MAX - 1);
   hrtimer.func = clock_gettick_test;
   hrtimer.arg  = NULL;
   hrtimer.expired = INT64_MAX - 1; // avoid reprogramming the timer
   hrtimer_async_restart(&hrtimer); 
   ```  
   
   This is the implementation recommended by @wangchdo , who claimed it could 
improve performance.  
   ```c
   // we have already initialized hrtimer.expired with INT64_MAX - 1 and 
hrtimer.arg with NULL.
   hrtimer.func = clock_gettick_test; // Equivalent to modifying the state.
   hrtimer_async_restart(&hrtimer); 
   ```  
   
   As mentioned earlier, the main time overhead of `hrtimer_start` comes from 
reprogramming the actual hardware timer (over 1000 CPU cycles). To avoid this 
overhead, we inserted a timer set to trigger in 200 years for this test case. 
**Both timer insertions do not actually program the hardware timer**. 
   
   The test results showed that: **the average overhead for both implementation 
tests running 1 million times is 48 CPU cycles (~23 ns).** This demonstrates 
that **CPU pipelining can really hide the extra read/write overhead**. 
Therefore, @wangchdo claimed that his API design can improve restart 
performance lacks evidence.  
   
   Additionally, I measured that the time consumed by enabling/disabling 
interrupts (`up_irq_save`/`up_irq_restore`) is 35 cycles. In other words, 
excluding synchronization, the average overhead for restart is only about 13 
cycles (queue insertion), while the overhead for `cancel + restart` (including 
the synchronization), when no hardware timer reprogramming is involved, is only 
73 cycles (~35 ns).  
   
   Compared to @wangchdo implementation in non-SMP mode, without introducing 
reprogramming, the main time overhead for both implementations lies in queue 
insertion and deletion. Since both use the same RB-Tree implementation, their 
overhead is identical. However, given that restart in this implementation has 
fewer conditional branches, its actual runtime performance should theoretically 
be slightly better than @wangchdo 's implementation, as it involves fewer 
branch predictions and branch misprediction rollbacks. At the same time, we 
don't need to add `state` or `reference count` to hrtimer, and its memory 
overhead is also lower compared to its implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to