[
https://issues.apache.org/jira/browse/UNOMI-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Sinovassin-Naïk updated UNOMI-878:
-------------------------------------------
Fix Version/s: unomi-3.1.0
(was: unomi-3.0.0)
> Enhanced Cluster-Aware Task Scheduling Service with Improved Developer
> Experience and Persistence Integration
> -------------------------------------------------------------------------------------------------------------
>
> Key: UNOMI-878
> URL: https://issues.apache.org/jira/browse/UNOMI-878
> Project: Apache Unomi
> Issue Type: Sub-task
> Components: unomi(-core)
> Affects Versions: unomi-3.0.0
> Reporter: Serge Huber
> Assignee: Serge Huber
> Priority: Major
> Fix For: unomi-3.1.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> The new SchedulerService provides a robust, cluster-aware task scheduling
> system with several valuable features that improve reliability, scalability,
> and developer experience:
> h1. Key Features:
> 1. Integrated Persistence Layer:
> - Seamless integration with Unomi's PersistenceService for task storage
> - Automatic task state persistence and recovery
> - Support for both persistent and in-memory task storage
> - Efficient task querying and filtering capabilities
> 2. Advanced Cluster Support:
> - Node-specific task execution control with `executorNode` configuration
> - Ability for nodes to opt-out of task execution
> - Automatic task distribution across cluster nodes
> - Task execution isolation through node-specific locking
> - Built-in crash recovery for failed nodes
> - Support for running tasks on all nodes or specific nodes
> 3. Enhanced Developer Experience:
> - Fluent Builder API for intuitive task creation
> - Simple recurring task creation for common use cases
> - Comprehensive task lifecycle management
> - Rich task monitoring capabilities
> 4. Robust Task Management:
> - Automatic task recovery from node crashes
> - Configurable retry policies
> - Task resumption from checkpoints
> - Automatic task purging with configurable TTL
> - Support for both one-shot and recurring tasks
> 5. Comprehensive Karaf Shell Integration:
> - Rich set of shell commands for task management
> - Real-time task monitoring and control
> - Support for both persistent and in-memory tasks
> 6. RESTful API Integration:
> - Full REST API support for task management
> - JSON-based interface with proper error handling
> - CORS-enabled for web client integration
> - Role-based access control (requires ADMINISTRATOR role)
> 7. Comprehensive Testing Suite:
> A. Unit Tests:
> - Extensive test coverage in `SchedulerServiceImplTest`
> - Tests for all core functionality:
> * Task creation and scheduling
> * Persistent vs memory storage
> * Task status management
> * Failure handling and retry logic
> * Cluster node behavior
> * Task filtering and querying
> * Task cancellation and cleanup
> B. Integration Tests (`SchedulerIT`):
> - End-to-end testing of REST API
> - Real persistence layer integration
> - Cluster behavior verification
> - Task execution verification
> - Error handling scenarios
> C. Test Coverage:
> - Task lifecycle management
> - Persistence operations
> - Cluster coordination
> - Error recovery
> - Configuration changes
> - Status transitions
> - Memory vs persistent storage
> h2. Example Usage (New Builder Pattern):
> {code:java}
> // Create a cluster-wide persistent
> taskschedulerService.newTask("dataSync")
> .withPeriod(1, TimeUnit.HOURS)
> .withSimpleExecutor(() -> syncData())
> .schedule();
> // Create a node-specific memory
> taskschedulerService.newTask("localCache")
> .nonPersistent()
> .withPeriod(5, TimeUnit.MINUTES)
> .withSimpleExecutor(() -> cleanCache())
> .schedule(); {code}
> h2. HTTP request examples:
> h3. List Tasks
> {noformat}
> GET /cxs/tasks
> GET /cxs/tasks?status=RUNNING
> GET /cxs/tasks?type=dataSync
> GET /cxs/tasks?offset=0&limit=50&sortBy=lastExecutionDate
> {noformat}
> Example Response:
> {noformat}
> {
> "list": [
> {
> "itemId": "123e4567-e89b-12d3-a456-426614174000",
> "taskType": "dataSync",
> "status": "RUNNING",
> "nextScheduledExecution": "2024-03-20T15:00:00Z",
> "lastExecutionDate": "2024-03-20T14:00:00Z",
> "failureCount": 0,
> "persistent": true,
> "parameters": {
> "targetSystem": "CRM",
> "batchSize": 1000
> }
> }
> ],
> "offset": 0,
> "pageSize": 50,
> "totalSize": 1
> }
> {noformat}
> h3. Get Task Details
> {noformat}
> GET /cxs/tasks/{taskId}
> {noformat}
> Example Response:
> {noformat}
> {
> "itemId": "123e4567-e89b-12d3-a456-426614174000",
> "taskType": "dataSync",
> "status": "RUNNING",
> "nextScheduledExecution": "2024-03-20T15:00:00Z",
> "lastExecutionDate": "2024-03-20T14:00:00Z",
> "failureCount": 0,
> "persistent": true,
> "parameters": {
> "targetSystem": "CRM",
> "batchSize": 1000
> },
> "currentStep": "Processing records",
> "statusDetails": {
> "processedCount": 500,
> "totalCount": 1000
> }
> }
> {noformat}
> h3. Cancel Task
> {noformat}
> DELETE /cxs/tasks/{taskId}
> {noformat}
> Response: 204 No Content
> h3. Retry Failed Task
> {noformat}
> POST /cxs/tasks/{taskId}/retry
> POST /cxs/tasks/{taskId}/retry?resetFailureCount=true
> {noformat}
> Example Response:
> {noformat}
> {
> "itemId": "123e4567-e89b-12d3-a456-426614174000",
> "taskType": "dataSync",
> "status": "SCHEDULED",
> "nextScheduledExecution": "2024-03-20T15:05:00Z",
> "failureCount": 0,
> "persistent": true
> }
> {noformat}
> h3. Update Task Configuration
> {noformat}
> PUT /cxs/tasks/{taskId}/config?maxRetries=5&retryDelay=60000
> {noformat}
> Example Response:
> {noformat}
> {
> "itemId": "123e4567-e89b-12d3-a456-426614174000",
> "taskType": "dataSync",
> "status": "SCHEDULED",
> "maxRetries": 5,
> "retryDelay": 60000,
> "persistent": true
> }
> {noformat}
> h3. Resume Crashed Task
> {noformat}
> POST /cxs/tasks/{taskId}/resume
> {noformat}
> Example Response:
> {noformat}
> {
> "itemId": "123e4567-e89b-12d3-a456-426614174000",
> "taskType": "dataSync",
> "status": "SCHEDULED",
> "nextScheduledExecution": "2024-03-20T15:00:00Z",
> "checkpointData": {
> "lastProcessedId": "12345",
> "progress": 50
> },
> "persistent": true
> }
> {noformat}
> h3. Error Responses
> All endpoints return appropriate HTTP status codes:
> * 400 Bad Request - Invalid parameters
> * 404 Not Found - Task not found
> * 403 Forbidden - Insufficient permissions
> * 500 Internal Server Error - Server-side error
> Example Error Response:
> {noformat}
> {
> "error": "Task not found",
> "code": "404",
> "message": "No task exists with ID: 123e4567"
> }
> {noformat}
> Technical Benefits:
> 1. Improved System Reliability:
> - Automatic crash recovery prevents task loss
> - Persistent storage ensures task survival across restarts
> - Cluster-aware execution prevents task conflicts
> 2. Enhanced Scalability:
> - Flexible node participation in task execution
> - Efficient task distribution across cluster
> - Support for both cluster-wide and node-specific tasks
> 3. Better Resource Management:
> - Option to run memory-only tasks for better performance
> - Automatic task cleanup to prevent storage bloat
> - Configurable thread pools for task execution
> 4. Operational Excellence:
> - Comprehensive task status tracking
> - Built-in monitoring capabilities
> - Automatic task recovery and cleanup
> Migration Impact:
> - Fully backward compatible
> - New features are opt-in
> - Existing task definitions continue to work as before
> Here is an example of the output of a Karaf Shell command :
> {noformat}
> karaf@root()> task-list
> ID │ Type │ Status
> │ Next Run │ Last Run │ Failures │ Persistent
> ─────────────────────────────────────┼───────────────────────────────┼───────────┼─────────────────────┼─────────────────────┼──────────┼───────────
> 208fa5c6-589f-41c8-a1c9-690c8aa78fb0 │ task-purge │
> RUNNING │ - │ - │ 0 │ Storage
> 2acbc53f-9431-43cc-8f4b-732cb023600d │ task-purge │
> RUNNING │ - │ - │ 0 │ Storage
> e7aabbe9-7941-4c08-8678-38c4aa610a53 │ task-purge │
> RUNNING │ - │ - │ 0 │ Storage
> 61e069ac-b56f-4e13-b3e4-bfc6e3f7bb4e │ task-purge │
> RUNNING │ - │ - │ 0 │ Storage
> c9564e10-da1a-486c-af9d-21ae01f224a7 │ profile-purge │
> RUNNING │ - │ - │ 0 │ Storage
> 9fe24a5c-19b4-4a94-ada1-01762f68c8e3 │ segment-date-recalculation │
> RUNNING │ - │ - │ 0 │ Storage
> 8ea91422-99e4-4f0d-b5ab-5b08ff3ea6da │ cache-refresh-Segment │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:14 │ 0 │ Memory
> 7b104452-309b-416c-9b6d-ddb322e4ef31 │ scope-refresh │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:14 │ 0 │ Memory
> ec29a924-fd7c-4887-91c4-b103fce439cb │ cache-refresh-ConditionType │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:07 │ 0 │ Memory
> 3641bd9f-8430-4a1a-93b6-862fac5b958a │ groovy-actions-refresh │
> COMPLETED │ 2025-02-12 08:31:30 │ 2025-02-12 08:45:14 │ 0 │ Memory
> 2a05faa8-7cea-4e22-9e9c-7cfe438eeee6 │ rules-statistics-refresh │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:06 │ 0 │ Memory
> 93ac16cc-4c99-4694-9aa0-d2ec0e53a7a3 │ rules-refresh │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:14 │ 0 │ Memory
> c2a339b8-b89c-4dd8-81c6-5c221b0c8fb6 │ property-type-load │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:08 │ 0 │ Memory
> 41c3444b-1520-4284-a54f-68b133fad9a2 │ cache-refresh-PropertyMergeSt │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:06 │ 0 │ Memory
> e86c3d4e-223d-40ee-81c3-ba7577e80913 │ cache-refresh-Scoring │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:14 │ 0 │ Memory
> 8d14195f-d8cd-4544-bd85-99746f090138 │ cache-refresh-ValueType │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:06 │ 0 │ Memory
> 74c7eeb6-5f26-4c80-b3d6-12bb35c6181e │ cache-refresh-ActionType │
> COMPLETED │ 2025-02-12 08:31:25 │ 2025-02-12 08:45:07 │ 0 │ Memory
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)