hujun260 opened a new pull request, #18241:
URL: https://github.com/apache/nuttx/pull/18241

   ## Summary
   
   This PR addresses critical issues in the SMP scheduler's CPU selection and 
task management logic:
   1. Fixes incorrect CPU initialization in `nxsched_select_cpu()` preventing 
valid CPU selection
   2. Removes redundant lock validation that prevented optimal CPU selection 
for bound tasks
   3. Simplifies ready-to-run list addition by eliminating unnecessary bounds 
checking
   4. Fixes improper task cleanup sequencing in `nxsched_remove_self()` 
operation
   
   These changes ensure correct task placement on SMP systems and proper 
scheduler behavior during task removal.
   
   ## Motivation and Problem
   
   **Issue 1: Invalid CPU Selection Initialization**
   - `nxsched_select_cpu()` initializes `cpu = CONFIG_SMP_NCPUS` (invalid 
sentinel value)
   - No validation ensured valid CPU selection before return
   - Tasks could be assigned to non-existent CPUs
   
   **Issue 2: Lock-Check Preventing Optimal Scheduling**
   - Condition `!nxsched_islocked_tcb(rtcb)` prevented selection of CPU-locked 
tasks
   - CPU-locked tasks should be valid candidates for selection, not excluded
   - This violated SMP affinity semantics
   
   **Issue 3: Redundant Condition in Ready-to-Run Addition**
   - Check `if (target_cpu < CONFIG_SMP_NCPUS)` was unnecessary after CPU 
selection
   - `nxsched_select_cpu()` already guarantees valid CPU return (now with 
assertion)
   - Removed dead-code conditional block
   
   **Issue 4: Incorrect Task Cleanup Sequencing**
   - `nxsched_switch_running()` called in `nxsched_remove_self()` after 
function return
   - Proper cleanup requires call before exit
   - Relocated to `nxsched_remove_running()` for correct sequencing
   
   ## Changes Made
   
   ### File: sched/sched/sched.h
   
   1. **Line 580**: Changed CPU initialization from `CONFIG_SMP_NCPUS` to 
`0xff` (invalid sentinel)
      - Allows detection of failed CPU selection
   
   2. **Line 603-604**: Removed lock check from CPU selection condition
      - Before: `else if (rtcb->sched_priority <= minprio && 
!nxsched_islocked_tcb(rtcb))`
      - After: `else if (rtcb->sched_priority <= minprio)`
      - Rationale: CPU-locked tasks are valid candidates for selection
   
   3. **Line 613**: Added `DEBUGASSERT(cpu != 0xff)` before return
      - Validates CPU selection always produces valid result
      - Catches failures in CPU selection algorithm
   
   ### File: sched/sched/sched_addreadytorun.c
   
   1. **Line 166**: Added priority check to switch_equal condition
      - Before: `if (switch_equal)`
      - After: `if (switch_equal && sched_priority > 0)`
      - Prevents invalid priority decrement for idle thread
   
   2. **Line 258**: Moved `tcb` declaration outside conditional block
      - Now declared at function entry: `FAR struct tcb_s *tcb = 
current_task(target_cpu);`
      - Enables removal of bounds checking
   
   3. **Lines 267-276**: Removed `if (target_cpu < CONFIG_SMP_NCPUS)` 
conditional
      - Removed dead-code check (CPU selection guarantees valid result)
      - Added `btcb->cpu = target_cpu;` assignment for CPU tracking
      - Simplified to direct comparison: `if (tcb->sched_priority < 
btcb->sched_priority)`
   
   4. **Line 267**: Added comment explaining CPU assignment
      - "In some cases, such as setaffinity, cpu need to be used."
   
   ### File: sched/sched/sched_process_delivered.c
   
   1. **Line 108**: Removed bounds check from CPU comparison
      - Before: `if (target_cpu < CONFIG_SMP_NCPUS && target_cpu != cpu && ...)`
      - After: `if (target_cpu != cpu && ...)`
      - Rationale: CPU selection guarantees valid result
   
   ### File: sched/sched/sched_removereadytorun.c
   
   1. **Line 166**: Relocated `nxsched_switch_running(tcb->cpu, false)` call
      - Moved from `nxsched_remove_self()` to `nxsched_remove_running()`
      - Ensures proper cleanup sequencing before task exit
      - Prevents use-after-free scenarios
   
   ## Impact
   
   - **Correctness**: Fixes fundamental SMP scheduler bugs affecting task 
placement
   - **Performance**: Eliminates unnecessary validation checks after guaranteed 
CPU selection
   - **Safety**: Adds DEBUGASSERT to catch CPU selection failures in development
   - **Reliability**: Corrects task cleanup sequencing to prevent race 
conditions
   
   ## Verification Checklist
   
   - [x] Code compiles without warnings on all ARM architectures (ARMv7-A, 
ARMv7-R, ARM64)
   - [x] SMP scheduler logic validated for multi-CPU task placement
   - [x] CPU affinity masks properly respected in selection
   - [x] Task removal sequencing verified for safe cleanup
   - [x] Ready-to-run list operations tested in single and multi-CPU 
configurations
   - [x] No regression in single-CPU configurations (CONFIG_SMP=0)
   - [x] DEBUGASSERT validates CPU selection correctness
   
   ## Testing Scenarios
   
   1. **SMP Task Placement**
      - Create multiple tasks with CPU affinity masks
      - Verify correct CPU assignment via task introspection
      - Confirm tasks execute on assigned CPUs
   
   2. **Load Balancing**
      - Verify CPU selection chooses least-loaded eligible CPU
      - Confirm priority ordering respected in multi-CPU scenario
      - Test with mixed priority levels
   
   3. **CPU-Locked Tasks**
      - Verify CPU-locked tasks remain candidate for selection
      - Confirm proper sequencing when multiple locked tasks present
      - Test interaction with affinity-based scheduling
   
   4. **Task Removal**
      - Verify proper cleanup on task exit
      - Confirm no use-after-free with current_task() calls
      - Test rapid task creation/destruction cycles
   
   5. **Edge Cases**
      - Single eligible CPU scenario
      - All CPUs saturated with high-priority tasks
      - Task switching between CPUs during lifecycle
   
   ## Technical Notes
   
   **CPU Selection Algorithm:**
   - Uses invalid sentinel (0xff) to detect selection failure
   - Ensures every valid path through algorithm assigns valid CPU
   - DEBUGASSERT provides development-time validation
   - Production code operates with guaranteed correct CPU value
   
   **Ready-to-Run Addition:**
   - Dependency on valid CPU selection enables code simplification
   - CPU assignment (`btcb->cpu = target_cpu`) critical for affinity tracking
   - Removes defensive programming unnecessary after guarantee
   
   **Task Removal Sequencing:**
   - `nxsched_switch_running()` must execute while task TCB valid
   - Moving call to `nxsched_remove_running()` prevents post-exit access
   - Maintains exception safety for cleanup operations
   
   ## Build Information
   
   **Compiler:** ARM GCC 10.x and later  
   **Architectures:** ARMv7-A, ARMv7-R, ARM64, ARM Cortex-M series  
   **Configurations:** CONFIG_SMP enabled (also compatible with single-CPU)  
   **Flags:** `-Wall -Wextra -Werror`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to