boxes created FLINK-39488:
-----------------------------

             Summary: 概要 [release-2.2] bin/flink run fails with JobStatusInfo 
NPE when job is in INITIALIZING state
                 Key: FLINK-39488
                 URL: https://issues.apache.org/jira/browse/FLINK-39488
             Project: Flink
          Issue Type: Bug
          Components: Client / Job Submission
    Affects Versions: 2.2.0
            Reporter: boxes


## Summary
When submitting a streaming job via `bin/flink run -d <jar>` to a Standalone
cluster built from `release-2.2` branch, the CLI client fails with a
NullPointerException while polling the job initialization status.
The job is in fact successfully submitted and finishes correctly on the server
side, but the CLI exits with code 1 and no indication of success.

## Environment
- Flink version: 2.2-SNAPSHOT (release-2.2 branch)
- Build: ./mvnw clean install -DskipTests -Dfast -pl flink-dist -am
- JDK: Amazon Corretto 17.0.14 (aarch64)
- OS: macOS 15.2 (Apple Silicon)
- Deployment: Standalone single-node (start-cluster.sh)
- Test job: examples/streaming/WordCount.jar

## Steps to Reproduce
1. Checkout release-2.2 branch
2. Build: ./mvnw clean install -DskipTests -Dfast -pl flink-dist -am
3. Start cluster: bin/start-cluster.sh
4. Submit: bin/flink run -d examples/streaming/WordCount.jar

## Actual Behavior
The CLI prints the following stack trace and exits with code 1:

  org.apache.flink.client.program.ProgramInvocationException:
    The main method caused an error: Failed to execute job 'WordCount'.
  Caused by: java.lang.RuntimeException: Error while waiting for job to be 
initialized
    at org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished
  Caused by: org.apache.flink.runtime.rest.util.RestClientException:
    Response was neither of the expected type
    ([simple type, class 
org.apache.flink.runtime.messages.webmonitor.JobStatusInfo])
    nor an error.
  Caused by: com.fasterxml.jackson.databind.exc.ValueInstantiationException:
    Cannot construct instance of `JobStatusInfo`, problem: 
`NullPointerException`
  Caused by: java.lang.NullPointerException
    at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:59)
    at 
org.apache.flink.runtime.messages.webmonitor.JobStatusInfo.<init>(JobStatusInfo.java:41)

Meanwhile GET /jobs/overview confirms the job reached FINISHED in ~755ms,
proving the JobManager side is healthy.

## Expected Behavior
bin/flink run -d should return exit code 0 after the job is successfully
submitted, or at minimum not crash with NPE during the initialization poll.

## Root Cause Analysis
JobStatusInfo constructor enforces checkNotNull(jobStatus):

  @JsonCreator
  public JobStatusInfo(@JsonProperty(FIELD_NAME_STATUS) JobStatus jobStatus) {
      this.jobStatus = checkNotNull(jobStatus);
  }

The server-side JobStatusHandler produces the response via:
  gateway.requestJobStatus(jobId, timeout).thenApply(JobStatusInfo::new);

When the job is still INITIALIZING and JobManagerRunner has not yet produced
a non-null JobStatus, the serialized payload has \{"status": null}, which
Jackson cannot deserialize due to the checkNotNull guard — causing a fatal
client-side NPE in the poll loop 
(ClientUtils.waitUntilJobInitializationFinished).

## Workaround
Submit via REST API directly (bypasses the client-side deserialization path):

  curl -F "[email protected]" http://localhost:8081/jars/upload
  curl -X POST -H 'Content-Type: application/json' \
    -d '\{"parallelism":1}' \
    http://localhost:8081/jars/<id>/run

## Suggested Fix
1. Server-side: Never emit \{"status": null}; return INITIALIZING as 
placeholder.
2. Client-side: Relax checkNotNull in JobStatusInfo and treat null as
   "unknown/initializing" in ClientUtils.waitUntilJobInitializationFinished.
3. Protocol-level: Make the field @JsonInclude(NON_NULL) and let the client
   tolerate a missing status field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to