hujun260 opened a new pull request, #18155: URL: https://github.com/apache/nuttx/pull/18155
## Summary This pull request series optimizes the NuttX performance counter infrastructure across six incremental commits: 1. **Race condition fix**: Protect hardware perf counter reads with spinlock 2. **Multi-architecture support**: Add configurable bit-width for different architectures (TriCore 31-bit, others 32-bit) 3. **Code simplification**: Remove unnecessary perf_init function and static initialization 4. **Structure refinement**: Reorganize implementation for userspace integration 5. **Userspace access**: Enable direct application access to hardware perf counters without syscalls 6. **32-bit optimization**: Remove unsupported atomic64 operations for 32-bit systems These changes provide a cleaner, more efficient performance monitoring infrastructure while enabling applications to profile themselves directly. ## Changes **Commit 1: Fix perf_gettime race condition (1 line)** - Move up_perf_gettime() call after spinlock acquisition - Prevents reading stale overflow state **Commit 2: Add perf overflow offset (22 lines)** - Add ARCH_PERF_COUNT_BITWIDTH configuration - Support TriCore 31-bit and standard 32-bit architectures - Properly calculate overflow correction for different bit widths **Commit 3: Remove perf_init simplification (34 lines removed)** - Remove perf_init() function declaration and call - Use static initialization instead - Avoid global variable in perf_update timer callback **Commit 4: Refine code structure (27 lines removed)** - Consolidate perf_gettime implementations - Remove duplicate perf_convert and perf_getfreq - Improve code organization for userspace library **Commit 5: Userspace PMU access (118 lines added)** - Add ARCH_HAVE_PERF_EVENTS_USER_ACCESS capability - Create libs/libc/sched/clock_perf.c for userspace perf functions - Update build system (CMakeLists.txt, Make.defs) - Enable direct userspace access to hardware counters **Commit 6: Remove 32-bit atomic64 support (50 lines removed)** - Remove atomic64 wrapper for 64-bit support - Eliminate unsupported atomic operations on 32-bit systems - Simplify userspace perf implementation **Total**: ~240 net lines changed, improved performance and maintainability ## Impact - **Performance**: Direct userspace access eliminates syscall overhead for profiling - **Multi-architecture**: Proper support for architectures with different counter widths - **Simplification**: 50+ lines removed through consolidated code - **Safety**: Race condition fixed by proper lock ordering - **Compatibility**: Works across 32-bit and 64-bit architectures - **Usability**: Applications can profile themselves without kernel involvement ## Technical Details **Race condition fix:** ```c // Before: Read might race with overflow update clock_t now = up_perf_gettime(); irqstate_t flags = spin_lock_irqsave(&perf->lock); // After: Protected read irqstate_t flags = spin_lock_irqsave(&perf->lock); clock_t now = up_perf_gettime(); -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
