[ 
https://issues.apache.org/jira/browse/FLINK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683433#comment-17683433
 ] 

Matthias Pohl commented on FLINK-30883:
---------------------------------------

{{jobmanager.0.log}} appears to be the JobManager run before the kill command 
is executed from within the test. The SIGTERM signal is received in 
{{jobmanager.0.log}} at 14:55:57,715 :
{code}
Feb 01 15:03:03 2023-02-01 14:55:57,715 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED 
SIGNAL 15: SIGTERM. Shutting down as requested.
{code}
which matches the timestamps of the logs that are printed around the time the 
kill command is triggered in the test code (see {{e2e_test_failure.log}}):
{code}
[...]
Feb 01 14:55:57 Waiting for jobmanager pod 
flink-native-k8s-application-ha-1-65d85b768b-7q5nr ready.
Feb 01 14:55:57 pod/flink-native-k8s-application-ha-1-65d85b768b-7q5nr 
condition met
Feb 01 14:55:57 Waiting for log "Restoring job  from Checkpoint"...
Feb 01 14:56:31 Log "Restoring job  from Checkpoint" shows up.
Feb 01 14:56:31 Waiting for job 
(flink-native-k8s-application-ha-1-65d85b768b-7q5nr) to have at least 1 
completed checkpoints ...
Feb 01 14:57:02 Missing JobID. Specify a JobID to cancel a job.
[...]
{code}

> Missing JobID caused the k8s e2e test to fail
> ---------------------------------------------
>
>                 Key: FLINK-30883
>                 URL: https://issues.apache.org/jira/browse/FLINK-30883
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes, Runtime / Coordination
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Priority: Critical
>              Labels: test-stability
>         Attachments: e2e_test_failure.log, 
> flink-vsts-client-fv-az378-840.log, jobmanager.0.log, jobmanager.1.log, 
> taskmanager.log
>
>
> We've experienced a test failure in {{Run kubernetes application HA test}} 
> due to a {{CliArgsException}}:
> {code}
> Feb 01 15:03:15 org.apache.flink.client.cli.CliArgsException: Missing JobID. 
> Specify a JobID to cancel a job.
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:689) 
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1107) 
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>  ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189) 
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157) 
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45569&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&s=ae4f8708-9994-57d3-c2d7-b892156e7812&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14&l=9866



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to