JNSimba opened a new pull request, #62238:
URL: https://github.com/apache/doris/pull/62238

   ## Summary
   
   - Add a new `RETRYING` status to `JobStatus` enum for streaming jobs, so 
users can distinguish between healthy running jobs and jobs that are 
encountering errors and auto-retrying.
   - Previously, both user-initiated pause and recoverable errors shared the 
same `PAUSED` status, making it impossible to tell them apart in `show 
streaming jobs`.
   - Now `PAUSED` is exclusively for user-initiated pause and unrecoverable 
errors, while `RETRYING` indicates the job is auto-recovering with exponential 
backoff.
   
   ### State transition diagram
   
   ```
                             CREATE JOB
                                 |
                                 v
                            +---------+
                            | PENDING |
                            +----+----+
                                 | createStreamingTask(), autoResumeCount = 0
                                 v
                            +---------+
               +------------|  RUNNING |------------------+
               |            +--+----+--+                  |
               |               |    |                     |
          user PAUSE      task fail/ |               hasReachedEnd
         (MANUAL_PAUSE)  meta fail/  |                    |
               |         sched fail  |                    v
               |        (recoverable)|               +----------+
               |               |    data quality     | FINISHED |
               |               |    error            +----------+
               |               |  (unrecoverable)
               v               v        |
          +--------+     +----------+   |
          | PAUSED |     | RETRYING |   |
          +---+----+     +----+-----+   |
              |               |         v
         user RESUME     backoff,   +--------+
              |          recreate   | PAUSED |
              v          task       +--------+
          +---------+         |
          | PENDING |    task result
          +----+----+    /         \
               |       success     fail
               v       |            |
            RUNNING    v            v
                    RUNNING      RETRYING
                   (count=0)    (keep, count++)
   
          RETRYING --> user PAUSE --> PAUSED
          any non-final --> user STOP --> STOPPED
   ```
   
   ### Key changes
   
   | File | Change |
   |------|--------|
   | `JobStatus.java` | Add `RETRYING` enum, include in `isRunning()` |
   | `AbstractJob.java` | Allow RETRYING state transitions |
   | `StreamingJobSchedulerTask.java` | New `handleRetryingState()` with 
backoff + task recreation; PAUSED does nothing |
   | `StreamingInsertJob.java` | `onStreamTaskFail`/`fetchMeta` → RETRYING; 
`onStreamTaskSuccess` → RUNNING + reset count; `gsonPostProcess` null fallback 
for downgrade safety |
   | `ResumeJobCommand.java` | Reject RESUME on RETRYING jobs |
   | `StreamingTaskScheduler.java` | Schedule failure → RETRYING |
   | `AbstractJobStatusTest.java` | Add RETRYING transition tests |
   | Regression tests (5 files) | Update expected status from PAUSED to 
RETRYING |
   
   ## Test plan
   
   - [ ] UT: `AbstractJobStatusTest` covers all RETRYING state transitions
   - [ ] Regression: `test_streaming_insert_job_alter_aksk` — alter to wrong 
credentials, verify RETRYING status
   - [ ] Regression: `test_streaming_insert_job_fetch_meta_error` — debug point 
fetch meta failure, verify RETRYING
   - [ ] Regression: `test_streaming_job_schedule_task_error` — debug point 
schedule failure, verify RETRYING
   - [ ] Regression: `test_streaming_insert_job_task_retry` — task timeout, 
verify RETRYING
   - [ ] Regression: CDC 
`test_streaming_job_cdc_stream_postgres_latest_alter_cred` — wrong PG 
credentials, verify RETRYING
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to