boxes created FLINK-39488:
-----------------------------
Summary: 概要 [release-2.2] bin/flink run fails with JobStatusInfo
NPE when job is in INITIALIZING state
Key: FLINK-39488
URL: https://issues.apache.org/jira/browse/FLINK-39488
Project: Flink
Issue Type: Bug
Components: Client / Job Submission
Affects Versions: 2.2.0
Reporter: boxes
## Summary
When submitting a streaming job via `bin/flink run -d <jar>` to a Standalone
cluster built from `release-2.2` branch, the CLI client fails with a
NullPointerException while polling the job initialization status.
The job is in fact successfully submitted and finishes correctly on the server
side, but the CLI exits with code 1 and no indication of success.
## Environment
- Flink version: 2.2-SNAPSHOT (release-2.2 branch)
- Build: ./mvnw clean install -DskipTests -Dfast -pl flink-dist -am
- JDK: Amazon Corretto 17.0.14 (aarch64)
- OS: macOS 15.2 (Apple Silicon)
- Deployment: Standalone single-node (start-cluster.sh)
- Test job: examples/streaming/WordCount.jar
## Steps to Reproduce
1. Checkout release-2.2 branch
2. Build: ./mvnw clean install -DskipTests -Dfast -pl flink-dist -am
3. Start cluster: bin/start-cluster.sh
4. Submit: bin/flink run -d examples/streaming/WordCount.jar
## Actual Behavior
The CLI prints the following stack trace and exits with code 1:
org.apache.flink.client.program.ProgramInvocationException:
The main method caused an error: Failed to execute job 'WordCount'.
Caused by: java.lang.RuntimeException: Error while waiting for job to be
initialized
at org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished
Caused by: org.apache.flink.runtime.rest.util.RestClientException:
Response was neither of the expected type
([simple type, class
org.apache.flink.runtime.messages.webmonitor.JobStatusInfo])
nor an error.
Caused by: com.fasterxml.jackson.databind.exc.ValueInstantiationException:
Cannot construct instance of `JobStatusInfo`, problem:
`NullPointerException`
Caused by: java.lang.NullPointerException
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:59)
at
org.apache.flink.runtime.messages.webmonitor.JobStatusInfo.<init>(JobStatusInfo.java:41)
Meanwhile GET /jobs/overview confirms the job reached FINISHED in ~755ms,
proving the JobManager side is healthy.
## Expected Behavior
bin/flink run -d should return exit code 0 after the job is successfully
submitted, or at minimum not crash with NPE during the initialization poll.
## Root Cause Analysis
JobStatusInfo constructor enforces checkNotNull(jobStatus):
@JsonCreator
public JobStatusInfo(@JsonProperty(FIELD_NAME_STATUS) JobStatus jobStatus) {
this.jobStatus = checkNotNull(jobStatus);
}
The server-side JobStatusHandler produces the response via:
gateway.requestJobStatus(jobId, timeout).thenApply(JobStatusInfo::new);
When the job is still INITIALIZING and JobManagerRunner has not yet produced
a non-null JobStatus, the serialized payload has \{"status": null}, which
Jackson cannot deserialize due to the checkNotNull guard — causing a fatal
client-side NPE in the poll loop
(ClientUtils.waitUntilJobInitializationFinished).
## Workaround
Submit via REST API directly (bypasses the client-side deserialization path):
curl -F "[email protected]" http://localhost:8081/jars/upload
curl -X POST -H 'Content-Type: application/json' \
-d '\{"parallelism":1}' \
http://localhost:8081/jars/<id>/run
## Suggested Fix
1. Server-side: Never emit \{"status": null}; return INITIALIZING as
placeholder.
2. Client-side: Relax checkNotNull in JobStatusInfo and treat null as
"unknown/initializing" in ClientUtils.waitUntilJobInitializationFinished.
3. Protocol-level: Make the field @JsonInclude(NON_NULL) and let the client
tolerate a missing status field.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)