Fix-Point opened a new pull request, #17276:
URL: https://github.com/apache/nuttx/pull/17276

   ## Summary
   
   This PR proposed ClockDevice, a new timer driver abstraction for NuttX. The 
new CLOCKDEVICE timer hardware abstraction delivers:
   - Functional correctness: Thread-safe and overflow-free
   - High performance: Up to 3x faster on ARMv7A platforms
   - Theoretically optimal timing precision: Uses hardware cycle counts as the 
time unit.
   - Time-unit independent interfaces: Decouples timer drivers from OS time 
subsystems
   - Minimalist driver implementation: Reduces driver code size by nearly 70%.
   
   ## Impact
   
   These code commits affect the timing subsystem, as well as the following 
architecture:
   - arm-v7a/arm-v7r/arm-v8r
   - arm-v8a
   - riscv
   - sim
   - tricore
   - intel64.
   
   ## Testing
   
   To evaluate the performance improvements, we conducted tests on three 
platforms: qemu-inte64/KVM, imx8qm-mek/arm64, and qemu-armv7a. We measured the 
CPU cycle overhead for:
   
   - Reading the current time
   - Setting a timer
   - Handling a timer callback
   
   Each operation was executed 10 million times, and the results were averaged.
   
   The results demonstrated significant performance improvments.
   
   On qemu-armv7a, the software division is the performance bottle-neck. 
ClockDevice bring significant performance improvements:
   - clock_gettime          2.56x
   - clock_systime_ticks 3.00x
   - wd_start                   1.67x
   - wd_start_cancel       1.43x
   - timer expiration         2.36x
   <img width="579" height="435" alt="image" 
src="https://github.com/user-attachments/assets/dc7394c2-0cc8-4c99-81a8-2920f14fbc2b";
 />
   
   On qemu-inte64/KVM, ClockDevice achieved up to 1.42x performance improved. 
Especially, for clock_gettime API, NuttX with ClockDevice improvements had 
1.31x better performance than Linux Kernel 6.8.0-51.
   <img width="597" height="448" alt="image" 
src="https://github.com/user-attachments/assets/5c3a2dba-5823-41f3-9e73-17f0327de25c";
 />
   
   On imx8qm-mek/arm64, ClockDevice achieved up to 1.68x performance 
improvement. Note on the early ARM64 platform, the INVDIV optimization can not 
work well since the hardware division instruction UDIV costs less CPU cycles on 
average than the INVDIV.
   <img width="607" height="455" alt="image" 
src="https://github.com/user-attachments/assets/576edd31-8a26-4a1c-8a5f-c86b8402157a";
 />
   
   ## Plan
   
   Due to the need for extensive code modifications, some work remains 
unfinished. The following is a list of the planned tasks.
   
   - [x] 1. Simplify the timer drivers and add SMP initialization.
   - [x] 2. Remove the callback and arg from the oneshot API.
   - [x] 3. Remove tick-based oneshot API.
   - [x] 4. Add new count-based oneshot API (CLKDEV).
   - [x] 5. Introduce optimized fast-path for count-based oneshot API.
   - [x] 6. Reimplemented the common-use timer drivers with count-based oneshot 
API.
   - [WIP] 7. Inlining arch_alarm for performance (3% ~ 5% less execution time 
for clock_gettime, wd_start and wd_expiration).
   
   Architectures support:
   - [x] arm-v7a/v7r/v8r generic timer
   - [x] goldfish
   - [x] arm-v8a generic timer
   - [x] sim
   - [x] intel64 tsc
   - [x] tricore systimer
   - [WIP/Soon] risc-v 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to