imbajin commented on PR #336:
URL:
https://github.com/apache/incubator-hugegraph-computer/pull/336#issuecomment-3393063945
## Summary
This PR introduces a comprehensive scheduler redesign with priority-based
scheduling, task dependencies, and cron support. The architecture is
well-designed, but there are **critical concurrency issues** that must be
addressed before merging.
**Verdict: REQUEST CHANGES** ⚠️
---
## Architecture Changes
### Major Refactoring ✅
The PR completely redesigns the scheduler from a simple queue-based system
to a sophisticated multi-component architecture:
**Old Architecture:**
- Simple `SpaceQueue` (FIFO)
- Basic `Broker` for worker assignment
- Single-threaded dispatch loop
**New Architecture:**
- `SchedulerResourceManager` - Worker pool management
- `SchedulerTaskManager` - Task lifecycle, dependencies, priorities
- `SchedulerAlgorithmManager` - Scheduling algorithm logic
- `SchedulerCronManager` - Recurring task support
- Event-driven scheduling with ticker-based dispatch
### New Features:
1. ✅ Priority-based scheduling with integer priorities
2. ✅ Task dependency management (`preorders`)
3. ✅ Cron expressions for recurring tasks
4. ✅ Exclusive/concurrent execution modes
5. ✅ Configurable soft scheduling
---
## Critical Issues 🚨
### 1. TOCTTOU Race Condition (scheduler_bl.go:421-423)
**Severity: CRITICAL**
```go
// TODO: Is here need a lock? TOCTTOU
if taskInfo.State != structure.TaskStateWaiting {
logrus.Errorf("task state is not in 'Waiting' state, taskID: %v",
taskInfo)
return
}
```
The TODO correctly identifies a Time-Of-Check-Time-Of-Use vulnerability.
State check and operations are not atomic.
**Fix:**
```go
defer taskInfo.Unlock(taskInfo.Lock())
if taskInfo.State != structure.TaskStateWaiting {
// ...
}
```
### 2. Resource Leak: Channel Not Closed (scheduler_bl.go:62)
**Severity: HIGH**
The `startChan` is never closed, causing potential goroutine leaks during
shutdown.
**Fix:** Add cleanup:
```go
func (s *ScheduleBl) Shutdown() {
close(s.startChan)
}
```
### 3. Dependency State Validation Missing (scheduler_bl.go:234-244)
**Severity: HIGH**
No check if dependency tasks are completed. Tasks could depend on
failed/canceled tasks.
**Fix:**
```go
if depTask.State != structure.TaskStateComplete {
return false, fmt.Errorf("dependency task %d is not complete (state:
%s)", depTaskID, depTask.State)
}
```
### 4. Potential Deadlock Pattern (scheduler_bl.go:97-99)
**Severity: HIGH**
Multiple functions acquire locks then call `TryScheduleNextTasks`, which
conditionally locks. This pattern is deadlock-prone.
**Fix:** Use consistent locking strategy, consider `sync.RWMutex` for
read-heavy operations.
---
## High Priority Issues ⚠️
### 5. Nil Pointer Risk (task_bl.go:87-92)
If `Scheduler.cronManager` is nil (initialization order issue), this will
panic:
```go
if err := Scheduler.cronManager.CheckCronExpression(cronExpr); err != nil {
```
Add nil checks.
### 6. API Breaking Change (scheduler_bl.go:502)
`CloseCurrent` signature changed from `(taskId int32)` to `(taskId int32,
removeWorkerName ...string)`. Document this breaking change and verify all call
sites are updated.
### 7. Commented Out Critical Code (scheduler_bl.go:229)
```go
//defer s.Unlock(s.Lock())
```
Critical locking code commented without explanation - this is dangerous.
### 8. Incomplete Implementation (scheduler_bl.go:159)
```go
// TODO: NEED TO JUDGE IF THE TASK CAN CONCURRENTLY RUNNING
// NOT only by user setting, but also by scheduler setting
```
Important concurrent execution logic deferred.
---
## Medium Priority Issues
### 9. Missing Input Validation (task_bl.go:71-79)
No upper bound on priority values. Large priorities could cause unexpected
behavior.
### 10. Silent Error Handling (scheduler_bl.go:441-443)
```go
if err := s.taskManager.AddTaskStartSequence(taskInfo.ID); err != nil {
logrus.Errorf("failed to add task '%d' to start sequence: %v",
taskInfo.ID, err)
}
```
Errors logged but not handled.
---
## CI/CD Status
### Failed Checks ❌
1. **check-license-header** - Missing Apache license headers in new files
2. **dependency-review** - New dependencies need review
### Passed Checks ✅
- CodeQL (Go & Java)
- Dependency license check
### Test Coverage ⚠️
- PR mentions tests in `/test` folder but not visible in diff
- Need unit tests for:
- Priority scheduling algorithm
- Dependency resolution
- Cron task scheduling
- Concurrent execution
---
## Recommendations
### Must Fix Before Merge:
1. Fix TOCTTOU race condition with atomic state checking
2. Add resource cleanup (close channels on shutdown)
3. Validate dependency task states
4. Fix failing CI checks (license headers, dependency review)
5. Review and fix all locking patterns
6. Add nil pointer checks for managers
### Should Fix:
7. Complete TODO implementations (concurrent execution logic)
8. Document API breaking changes
9. Add comprehensive test coverage
10. Add input validation (priority bounds)
11. Remove commented code with explanation
### Documentation Needed:
12. Migration guide for scheduler API changes
13. Configuration documentation (`start_chan_size`, `ticker_interval`,
`soft_schedule`)
14. Cron expression format specification
---
## Positive Aspects 👍
1. ✅ Well-architected component separation
2. ✅ Good configuration externalization
3. ✅ Comprehensive feature set
4. ✅ Improved observability with detailed logging
5. ✅ Added Docker support documentation
---
## Risk Assessment
**Risk Level:** MEDIUM-HIGH due to concurrency issues in core scheduling
logic
**Estimated Time to Fix:** 2-3 days for critical fixes, 1 week for
comprehensive improvements
The architecture and features are excellent, but the critical concurrency
issues must be resolved before this can be safely merged to production.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]