W-M-R opened a new pull request, #18337:
URL: https://github.com/apache/nuttx/pull/18337

   *Note: Please adhere to [Contributing 
Guidelines](https://github.com/apache/nuttx/blob/master/CONTRIBUTING.md).*
   
   ## Summary
   
   This PR introduces a comprehensive Performance Monitoring Unit (PMU) and 
`perf` event subsystem for NuttX, bringing Linux-like performance profiling 
capabilities to the RTOS. This implementation enables system-wide and per-task 
performance analysis, which is critical for optimizing real-time applications 
and understanding system behavior.
   
   **Key Features:**
   - **Core perf event infrastructure**: Implements a complete perf event 
framework integrated with the NuttX scheduler, supporting both hardware and 
software events
   - **ARM64 PMU driver**: Hardware Performance Monitoring Unit support for 
ARM64 (Cortex-R82, ARMv8-A) with interrupt-driven sampling
   - **Software events**: CPU clock, task clock, context switches, and other 
software-based performance counters
   - **Sampling and profiling**: Call chain sampling with configurable periods, 
enabling flame graph generation and hotspot analysis
   - **SMP support**: Multi-core sampling and per-CPU event tracking
   - **Symbol resolution**: Integration with ELF symbol tables for host-side 
analysis with standard perf tools
   - **Flexible APIs**: Similar to Linux perf_event_open(), allowing userspace 
to create, enable, disable, and read performance counters
   
   **Why this change is necessary:**
   Performance analysis is essential for optimizing embedded systems. This 
implementation provides:
   - Zero-overhead performance counters when not in use
   - Minimal overhead when active (interrupt-driven sampling)
   - Standard interface compatible with existing perf tooling
   - Deep insights into CPU usage, cache behavior, and scheduling patterns
   
   **Technical approach:**
   1. Scheduler integration: Hooks in task creation, exit, and context switch 
paths
   2. Architecture-specific PMU drivers (ARM64 PMUv3 initially)
   3. Ring buffer for efficient event data collection
   4. File descriptor-based API for userspace control
   5. Support for both counting (read values) and sampling (periodic 
interrupts) modes
   
   **Files changed:** 51 files with 6368 insertions across scheduler, 
arch/arm64, drivers, and include directories
    
   
   ## Impact
   
   **Users:**
   - **New capability**: Users can now profile NuttX applications using 
perf-like tools
   - **Debugging aid**: Easier identification of performance bottlenecks and 
inefficient code paths
   - **Transparent**: No impact on applications unless explicitly enabled via 
CONFIG_SCHED_PERF
   
   **Build system:**
   - **Optional feature**: Guarded by CONFIG_SCHED_PERF and related Kconfig 
options
   - **No default impact**: Disabled by default, users must explicitly enable 
in configuration
   - **Architecture specific**: PMU hardware support initially for ARM64 
(Cortex-R82, ARMv8-A)
   
   **Hardware:**
   - **ARM64 PMU**: Utilizes hardware performance counters on supported ARM64 
cores
   - **Interrupt-driven**: Uses PMU overflow interrupts for sampling
   - **Software fallback**: Software events work on any architecture
   
   **Documentation:**
   - New APIs in `include/nuttx/perf.h`
   - Configuration options in drivers/perf/Kconfig and sched/Kconfig
   
   **Security:**
   - Access control through file permissions (future enhancement)
   - Kernel memory protection maintained
   - No security regressions introduced
   
   **Compatibility:**
   - **Backward compatible**: Existing applications unaffected when 
CONFIG_SCHED_PERF disabled
   - **ABI stable**: No changes to existing system call interfaces
   - **Portable**: Core framework architecture-independent; PMU drivers are 
architecture-specific
   
   **Configuration dependencies:**
   - Requires CONFIG_SCHED_PERF for core framework
   - Requires CONFIG_ARM64_PMU for ARM64 hardware support
   - Optional CONFIG_SCHED_INSTRUMENTATION integration
   
   ## Testing
   
   build arm64:nsh config
   
   ```
   `perf stat` is a performance statistician; 
   
   NuttShell (NSH) NuttX-12.12.0
   nsh> 
   nsh> 
   nsh> perf stat hello
   No command specified. Monitoring new process.
   result = 1
   
    Performance counter stats
   
   4997684305       cycles
       200000       stalled-cycles-frontend
            0       stalled-cycles-backend
   
    5.010000000 seconds time elapsed
   nsh> 
   ```
   
   `perf record` is a performance recorder.
   
   ```
   nsh> cd /tmp
   nsh> ls 
   /tmp:
   nsh> perf record -e cycles hello
   perf pid 7
   No command specified. Monitoring new process.
   result = 1
   receive signal 17
   nsh> ls -l
   /tmp:
    -rwxrwxrwx      102392 perf.data
   nsh> 
   ```
   Export perf.data to your PC using your own method.
   and run this comand:  
   `./nuttx/tools/perfaddmmap.py -f ./perf.data -e 
./out/nuttx_qemu-armv8a_nsh/nuttx`
   it will replace your perf.data, and you run:
   `perf report`
   It will display the following data: 
   <img width="616" height="188" alt="image" 
src="https://github.com/user-attachments/assets/c7adacdf-0da1-4002-8067-203a35d76348";
 />
   ❗❗❗❗❗However, please note that some issues still exist in this version of 
perf, as it depends on the implementation of other patches. I will continue to 
fix the remaining issues as more patches are incorporated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to