W-M-R opened a new pull request, #18337: URL: https://github.com/apache/nuttx/pull/18337
*Note: Please adhere to [Contributing Guidelines](https://github.com/apache/nuttx/blob/master/CONTRIBUTING.md).* ## Summary This PR introduces a comprehensive Performance Monitoring Unit (PMU) and `perf` event subsystem for NuttX, bringing Linux-like performance profiling capabilities to the RTOS. This implementation enables system-wide and per-task performance analysis, which is critical for optimizing real-time applications and understanding system behavior. **Key Features:** - **Core perf event infrastructure**: Implements a complete perf event framework integrated with the NuttX scheduler, supporting both hardware and software events - **ARM64 PMU driver**: Hardware Performance Monitoring Unit support for ARM64 (Cortex-R82, ARMv8-A) with interrupt-driven sampling - **Software events**: CPU clock, task clock, context switches, and other software-based performance counters - **Sampling and profiling**: Call chain sampling with configurable periods, enabling flame graph generation and hotspot analysis - **SMP support**: Multi-core sampling and per-CPU event tracking - **Symbol resolution**: Integration with ELF symbol tables for host-side analysis with standard perf tools - **Flexible APIs**: Similar to Linux perf_event_open(), allowing userspace to create, enable, disable, and read performance counters **Why this change is necessary:** Performance analysis is essential for optimizing embedded systems. This implementation provides: - Zero-overhead performance counters when not in use - Minimal overhead when active (interrupt-driven sampling) - Standard interface compatible with existing perf tooling - Deep insights into CPU usage, cache behavior, and scheduling patterns **Technical approach:** 1. Scheduler integration: Hooks in task creation, exit, and context switch paths 2. Architecture-specific PMU drivers (ARM64 PMUv3 initially) 3. Ring buffer for efficient event data collection 4. File descriptor-based API for userspace control 5. Support for both counting (read values) and sampling (periodic interrupts) modes **Files changed:** 51 files with 6368 insertions across scheduler, arch/arm64, drivers, and include directories ## Impact **Users:** - **New capability**: Users can now profile NuttX applications using perf-like tools - **Debugging aid**: Easier identification of performance bottlenecks and inefficient code paths - **Transparent**: No impact on applications unless explicitly enabled via CONFIG_SCHED_PERF **Build system:** - **Optional feature**: Guarded by CONFIG_SCHED_PERF and related Kconfig options - **No default impact**: Disabled by default, users must explicitly enable in configuration - **Architecture specific**: PMU hardware support initially for ARM64 (Cortex-R82, ARMv8-A) **Hardware:** - **ARM64 PMU**: Utilizes hardware performance counters on supported ARM64 cores - **Interrupt-driven**: Uses PMU overflow interrupts for sampling - **Software fallback**: Software events work on any architecture **Documentation:** - New APIs in `include/nuttx/perf.h` - Configuration options in drivers/perf/Kconfig and sched/Kconfig **Security:** - Access control through file permissions (future enhancement) - Kernel memory protection maintained - No security regressions introduced **Compatibility:** - **Backward compatible**: Existing applications unaffected when CONFIG_SCHED_PERF disabled - **ABI stable**: No changes to existing system call interfaces - **Portable**: Core framework architecture-independent; PMU drivers are architecture-specific **Configuration dependencies:** - Requires CONFIG_SCHED_PERF for core framework - Requires CONFIG_ARM64_PMU for ARM64 hardware support - Optional CONFIG_SCHED_INSTRUMENTATION integration ## Testing build arm64:nsh config ``` `perf stat` is a performance statistician; NuttShell (NSH) NuttX-12.12.0 nsh> nsh> nsh> perf stat hello No command specified. Monitoring new process. result = 1 Performance counter stats 4997684305 cycles 200000 stalled-cycles-frontend 0 stalled-cycles-backend 5.010000000 seconds time elapsed nsh> ``` `perf record` is a performance recorder. ``` nsh> cd /tmp nsh> ls /tmp: nsh> perf record -e cycles hello perf pid 7 No command specified. Monitoring new process. result = 1 receive signal 17 nsh> ls -l /tmp: -rwxrwxrwx 102392 perf.data nsh> ``` Export perf.data to your PC using your own method. and run this comand: `./nuttx/tools/perfaddmmap.py -f ./perf.data -e ./out/nuttx_qemu-armv8a_nsh/nuttx` it will replace your perf.data, and you run: `perf report` It will display the following data: <img width="616" height="188" alt="image" src="https://github.com/user-attachments/assets/c7adacdf-0da1-4002-8067-203a35d76348" /> ❗❗❗❗❗However, please note that some issues still exist in this version of perf, as it depends on the implementation of other patches. I will continue to fix the remaining issues as more patches are incorporated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
