hujun260 opened a new pull request, #18118:
URL: https://github.com/apache/nuttx/pull/18118
## Summary
This PR fixes a regression introduced by PR #18040 by adding proper NULL
pointer
validation in critical task information retrieval functions. The changes
replace
DEBUGASSERT checks with graceful NULL handling, improving system robustness
during
early startup and error conditions where task information may not be
available.
## Problem Description
PR #18040 introduced changes that assumed task information would always be
available
when calling getpid() and task_get_info(). However, during early system
startup or in
certain error conditions, the TLS (Thread Local Storage) info structure may
not be
properly initialized, leading to NULL pointer dereferences and system panics.
## Changes Made
### 1. getpid() Function Improvements
**File**: `libs/libc/sched/task_getpid.c`
**Changes:**
- Add missing `#include <nuttx/sched.h>` for IDLE_PROCESS_ID constant
- Replace `DEBUGASSERT(info != NULL)` with runtime check
- Return `IDLE_PROCESS_ID` when task info is NULL instead of asserting
- Use ternary operator: `return info ? info->ta_pid : IDLE_PROCESS_ID;`
**Benefits:**
- Graceful handling of NULL conditions
- Returns valid PID (IDLE_PROCESS_ID = 0) during early startup
- Prevents system panic in error conditions
### 2. task_get_info() Function Safety
**File**: `libs/libc/tls/task_getinfo.c`
**Changes:**
- Replace direct dereference with NULL-safe access
- Return NULL when TLS info is unavailable: `return info ? info->tl_task :
NULL;`
- Allows callers to handle NULL gracefully
**Benefits:**
- Consistent error handling
- Enables caller-specific error recovery
- Better separation of concerns
### 3. nxsched_get_stackinfo() Function Robustness
**File**: `sched/sched/sched_get_stackinfo.c`
**Changes:**
- Split DEBUGASSERT to check stackinfo separately from rtcb
- Add explicit NULL check for rtcb (running task)
- Return -ESRCH error code when rtcb is NULL
- Proper error handling path instead of assertion
**Benefits:**
- Distinguishes between invalid arguments and missing task context
- Returns meaningful error code (-ESRCH: "No such process")
- Handles early startup conditions correctly
## Root Cause Analysis
The regression occurred because:
1. PR #18040 assumed TLS would always be initialized
2. Early in system startup, before task initialization completes, TLS may be
NULL
3. Direct pointer dereferences caused NULL pointer dereferences
4. Previous defensive code with DEBUGASSERT was removed
## Solution Approach
Rather than reverting PR #18040, this PR maintains the intent of that change
while
adding proper NULL pointer safety:
- Keep the optimized code structure
- Add defensive NULL checks at appropriate points
- Return sensible defaults instead of panicking
- Provide meaningful error codes
## Impact
### Reliability
- Prevents system panic during early startup
- Graceful handling of error conditions
- Better resilience to timing issues
### Compatibility
- Backward compatible with existing code
- No API changes required
- Maintains PR #18040 optimizations
### Code Quality
- Proper error handling
- Clear error indication via return values
- Better defensive programming practices
## Testing
### Test Environment
- Linux x86_64 with NuttX cross-compiler
- Multiple build configurations:
- CONFIG_BUILD_FLAT
- CONFIG_BUILD_PROTECTED
- CONFIG_SMP
### Test Procedure
1. **Early Startup Testing**:
- Verify getpid() works before full task initialization
- Confirm returns IDLE_PROCESS_ID during startup
- Test with CONFIG_SCHED_INSTRUMENTATION enabled
2. **Error Condition Testing**:
- Trigger conditions where task info is unavailable
- Verify graceful error handling
- Confirm no system panics
3. **Normal Operation**:
- Verify getpid() returns correct PIDs during normal execution
- Test task creation and PID assignment
- Confirm stackinfo retrieval works
4. **Multi-core Testing**:
- Test with CONFIG_SMP enabled
- Verify thread-safe operation
- Test concurrent getpid() calls
5. **Regression Testing**:
- Verify PR #18040 optimizations still apply
- Confirm no performance degradation
- Test with original PR #18040 use cases
### Test Results
✅ **Early Startup**: System boots without NULL pointer dereference
✅ **getpid()**: Returns valid PID or IDLE_PROCESS_ID as appropriate
✅ **task_get_info()**: Returns NULL on invalid condition
✅ **stackinfo**: Returns -ESRCH for missing task context
✅ **Performance**: Maintains PR #18040 optimization benefits
✅ **Compatibility**: Works with all build configurations
✅ **Regression**: No breakage from PR #18040 changes
✅ **Multi-core**: Proper synchronization maintained
## Verification Checklist
- ✅ No JIRA IDs or internal identifiers in commit
- ✅ Commit message follows NuttX conventions
- ✅ NULL pointer safety improved
- ✅ Error handling returns meaningful codes
- ✅ API compatibility maintained
- ✅ No breaking changes
- ✅ Early startup conditions handled
- ✅ Comments updated where necessary
- ✅ All tests passing
- ✅ PR #18040 optimizations preserved
---
## Related References
- **PR #18040**: Original optimization that introduced the regression
- **Issue**: NULL pointer dereference during early startup
- **Root Cause**: Assumption that TLS info would always be available
- **Solution**: Defensive NULL pointer validation with sensible defaults
## Migration Notes
No migration needed. This PR is purely a bug fix with no API changes.
Existing code
using getpid() and related functions continues to work unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]