Serge Huber created UNOMI-939:
---------------------------------
Summary: SchedulerService correctness: lock leak, persistent flag,
parallel execution, recovery bugs
Key: UNOMI-939
URL: https://issues.apache.org/jira/browse/UNOMI-939
Project: Apache Unomi
Issue Type: Sub-task
Components: unomi(-core)
Affects Versions: unomi-3.1.0
Reporter: Serge Huber
Assignee: Serge Huber
Fix For: unomi-3.1.0
Code review of PR #760 identified five correctness bugs in the new
{{SchedulerServiceImpl}} and supporting classes:
*1. Lock held permanently after shutdown check ({{{}TaskExecutionManager{}}})*
After {{prepareForExecution}} succeeds (lock acquired, status = RUNNING,
persisted), a final {{isShutdownNow()}} check returns early without calling the
executor or any status callback, and without releasing the lock. The task stays
RUNNING until the lock timeout expires with no recovery signal. _Fix:_ Release
lock and reset to SCHEDULED before the early return.
*2. {{createRecurringTask}} silently drops the {{persistent}} parameter
({{{}SchedulerServiceImpl:1562{}}})* The method always calls
{{.nonPersistent()}} regardless of the {{boolean persistent}} argument. Any
caller passing {{true}} silently gets a non-persistent task. _Fix:_
Conditionally call {{.nonPersistent()}} only when {{{}persistent == false{}}}.
*3. RUNNING tasks double-processed by crash recovery
({{{}TaskRecoveryManager{}}}, {{{}SchedulerServiceImpl{}}})*
{{recoverCrashedTasks()}} calls both {{recoverRunningTasks()}} and
{{recoverLockedTasks()}} sequentially. {{findLockedTasks()}} returns tasks with
{{lockOwner != null}} including RUNNING ones, so a task already processed by
{{recoverRunningTasks}} is processed a second time, risking a double restart.
_Fix:_ Exclude {{RUNNING}} status from the {{findLockedTasks}} filter.
*4. {{MAX_CRASH_RECOVERY_AGE_MINUTES}} declared but never enforced
({{{}TaskRecoveryManager:35{}}})* The constant is defined but never referenced.
Tasks that have been crashed for days are restarted on every recovery cycle.
_Fix:_ Check {{statusDetails.crashTime}} against
{{MAX_CRASH_RECOVERY_AGE_MINUTES}} in {{{}shouldRestartTask{}}}.
*5. {{processTaskGroup}} ignores per-task {{allowParallelExecution}} flag
({{{}SchedulerServiceImpl:921{}}})* The type-level {{hasRunningTaskOfType}}
check blocks all tasks of a given type when any one is running, regardless of
individual {{allowParallelExecution=true}} flags. _Fix:_ Check the flag per
task; only block non-parallel tasks when a sibling is running.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)