teamconfx opened a new pull request, #27463:
URL: https://github.com/apache/flink/pull/27463

   This PR fixes 
[FLINK-38870](https://issues.apache.org/jira/browse/FLINK-38870).
   
   ### Problem
   
     When a JobManager loses leadership, jobs enter SUSPENDED state. The old 
error message "Job completed with illegal status: null" was uninformative and 
confusing.
   
   ### Solution
   
     My approach improves on the JIRA proposal by:
     1. Preserving the actual SUSPENDED JobStatus instead of losing it (the 
proposal only added a field but kept setting it to null)
     2. Adding serialization support to preserve SUSPENDED state across REST 
API calls
     3. Maintaining backward compatibility with older clients
   
   ### Files Modified
   
     1. JobResult.java
   
     - Constructor validation: Changed from requiring "globally terminal" to 
just "terminal" states, allowing SUSPENDED
     - createFrom(): Now stores actual JobStatus including SUSPENDED (was 
setting null for non-globally-terminal states)
     - toJobExecutionResult(): Added specific handling for SUSPENDED with 
detailed error message:
     Job is in state SUSPENDED. This commonly happens when the JobManager lost 
leadership.
     The job may recover automatically if High Availability and a persistent 
job store are configured.
     If recovery is not possible (e.g., non-persistent ExecutionPlanStore), the 
job needs to be resubmitted.
   
     2. JobResultSerializer.java
   
     - Added new job-status field to preserve actual JobStatus in JSON 
(alongside existing application-status for backward compatibility)
   
     3. JobResultDeserializer.java
   
     - Reads new job-status field if present (takes priority)
     - Falls back to application-status for backward compatibility with older 
messages
   
     4. Tests Added
   
     - JobResultTest: 3 new tests for SUSPENDED state handling
     - JobResultDeserializerTest: 3 new tests for serialization with SUSPENDED 
state
   
   ### Test Results
   
   ```
     Tests run: 17, Failures: 0, Errors: 0, Skipped: 0
     BUILD SUCCESS
   ```
   ### Error Message Comparison
   
     Before:
   ```
     Job completed with illegal status: null.
   ```
   
     After:
   ```
     Job is in state SUSPENDED. This commonly happens when the JobManager lost 
leadership.
     The job may recover automatically if High Availability and a persistent 
job store are configured.
     If recovery is not possible (e.g., non-persistent ExecutionPlanStore), the 
job needs to be resubmitted.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to