[
https://issues.apache.org/jira/browse/FLINK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683403#comment-17683403
]
Matthias Pohl edited comment on FLINK-30883 at 2/2/23 1:04 PM:
---------------------------------------------------------------
{code}
Feb 01 14:53:49 deployment.apps/flink-native-k8s-application-ha-1 condition met
Feb 01 14:53:49 Waiting for job
(flink-native-k8s-application-ha-1-65d85b768b-7q5nr) to have at least 3
completed checkpoints ...
Feb 01 14:55:57 Waiting for jobmanager pod
flink-native-k8s-application-ha-1-65d85b768b-7q5nr ready.
Feb 01 14:55:57 pod/flink-native-k8s-application-ha-1-65d85b768b-7q5nr
condition met
Feb 01 14:55:57 Waiting for log "Restoring job from Checkpoint"...
Feb 01 14:56:31 Log "Restoring job from Checkpoint" shows up.
{code}
It appears that the {{job_id}} wasn't properly extracted in
[test_kubernetes_application_ha.sh:71|https://github.com/apache/flink/blob/6cce68dcdc1baf4be2a9e1549983d010644b5ee3/flink-end-to-end-tests/test-scripts/test_kubernetes_application_ha.sh#L71].
{{jm_job_name}} in provided
(flink-native-k8s-application-ha-1-65d85b768b-7q5nr) ;verifyable through the
logs shown above). The most probable reason is that the job wasn't submitted
was (Author: mapohl):
{code}
Feb 01 14:53:49 deployment.apps/flink-native-k8s-application-ha-1 condition met
Feb 01 14:53:49 Waiting for job
(flink-native-k8s-application-ha-1-65d85b768b-7q5nr) to have at least 3
completed checkpoints ...
Feb 01 14:55:57 Waiting for jobmanager pod
flink-native-k8s-application-ha-1-65d85b768b-7q5nr ready.
Feb 01 14:55:57 pod/flink-native-k8s-application-ha-1-65d85b768b-7q5nr
condition met
Feb 01 14:55:57 Waiting for log "Restoring job from Checkpoint"...
Feb 01 14:56:31 Log "Restoring job from Checkpoint" shows up.
{code}
It appears that the {{job_id}} wasn't properly extracted in
[test_kubernetes_application_ha.sh:71|https://github.com/apache/flink/blob/6cce68dcdc1baf4be2a9e1549983d010644b5ee3/flink-end-to-end-tests/test-scripts/test_kubernetes_application_ha.sh#L71].
{{jm_job_name}} in provided
(flink-native-k8s-application-ha-1-65d85b768b-7q5nr) ;verifyable through the
logs shown above)
> Missing JobID caused the k8s e2e test to fail
> ---------------------------------------------
>
> Key: FLINK-30883
> URL: https://issues.apache.org/jira/browse/FLINK-30883
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes, Runtime / Coordination
> Affects Versions: 1.17.0
> Reporter: Matthias Pohl
> Priority: Critical
> Labels: test-stability
>
> We've experienced a test failure in {{Run kubernetes application HA test}}
> due to a {{CliArgsException}}:
> {code}
> Feb 01 15:03:15 org.apache.flink.client.cli.CliArgsException: Missing JobID.
> Specify a JobID to cancel a job.
> Feb 01 15:03:15 at
> org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:689)
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15 at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1107)
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15 at
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15 at
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15 at
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15 at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45569&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&s=ae4f8708-9994-57d3-c2d7-b892156e7812&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14&l=9866
--
This message was sent by Atlassian Jira
(v8.20.10#820010)