Serge Huber created UNOMI-939:
---------------------------------

             Summary: SchedulerService correctness: lock leak, persistent flag, 
parallel execution, recovery bugs
                 Key: UNOMI-939
                 URL: https://issues.apache.org/jira/browse/UNOMI-939
             Project: Apache Unomi
          Issue Type: Sub-task
          Components: unomi(-core)
    Affects Versions: unomi-3.1.0
            Reporter: Serge Huber
            Assignee: Serge Huber
             Fix For: unomi-3.1.0


Code review of PR #760 identified five correctness bugs in the new 
{{SchedulerServiceImpl}} and supporting classes:

*1. Lock held permanently after shutdown check ({{{}TaskExecutionManager{}}})* 
After {{prepareForExecution}} succeeds (lock acquired, status = RUNNING, 
persisted), a final {{isShutdownNow()}} check returns early without calling the 
executor or any status callback, and without releasing the lock. The task stays 
RUNNING until the lock timeout expires with no recovery signal. _Fix:_ Release 
lock and reset to SCHEDULED before the early return.

*2. {{createRecurringTask}} silently drops the {{persistent}} parameter 
({{{}SchedulerServiceImpl:1562{}}})* The method always calls 
{{.nonPersistent()}} regardless of the {{boolean persistent}} argument. Any 
caller passing {{true}} silently gets a non-persistent task. _Fix:_ 
Conditionally call {{.nonPersistent()}} only when {{{}persistent == false{}}}.

*3. RUNNING tasks double-processed by crash recovery 
({{{}TaskRecoveryManager{}}}, {{{}SchedulerServiceImpl{}}})* 
{{recoverCrashedTasks()}} calls both {{recoverRunningTasks()}} and 
{{recoverLockedTasks()}} sequentially. {{findLockedTasks()}} returns tasks with 
{{lockOwner != null}} including RUNNING ones, so a task already processed by 
{{recoverRunningTasks}} is processed a second time, risking a double restart. 
_Fix:_ Exclude {{RUNNING}} status from the {{findLockedTasks}} filter.

*4. {{MAX_CRASH_RECOVERY_AGE_MINUTES}} declared but never enforced 
({{{}TaskRecoveryManager:35{}}})* The constant is defined but never referenced. 
Tasks that have been crashed for days are restarted on every recovery cycle. 
_Fix:_ Check {{statusDetails.crashTime}} against 
{{MAX_CRASH_RECOVERY_AGE_MINUTES}} in {{{}shouldRestartTask{}}}.

*5. {{processTaskGroup}} ignores per-task {{allowParallelExecution}} flag 
({{{}SchedulerServiceImpl:921{}}})* The type-level {{hasRunningTaskOfType}} 
check blocks all tasks of a given type when any one is running, regardless of 
individual {{allowParallelExecution=true}} flags. _Fix:_ Check the flag per 
task; only block non-parallel tasks when a sibling is running.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to