[
https://issues.apache.org/jira/browse/FLINK-26473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505920#comment-17505920
]
Thomas Weise commented on FLINK-26473:
--------------------------------------
We should also try to identify the job submission error case as that is a
frequent issue that requires good feedback to the user. At the moment the job
submission completes without feedback and the pod goes into CrashLoopBackOff
{code:java}
NAME READY STATUS
RESTARTS AGE
basic-checkpoint-ha-example-5bcc7f7f48-462c7 0/1 CrashLoopBackOff 9
24m {code}
with error message in the container log
{code:java}
java.util.concurrent.CompletionException:
org.apache.flink.client.deployment.application.ApplicationExecutionException:
Could not execute application.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
~[?:1.8.0_322]
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
~[?:1.8.0_322]
at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957)
~[?:1.8.0_322]
at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
~[?:1.8.0_322]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
~[?:1.8.0_322]
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
~[?:1.8.0_322]
at
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:287)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:224)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_322]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_322]
at
org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171)
~[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
at
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
~[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
at
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41)
~[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
[?:1.8.0_322]
at
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
[?:1.8.0_322]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
[?:1.8.0_322]
at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
[?:1.8.0_322]
Caused by:
org.apache.flink.client.deployment.application.ApplicationExecutionException:
Could not execute application.
... 13 more
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error: For input string: "junk2"
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:261)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
... 12 more
Caused by: java.lang.NumberFormatException: For input string: "junk2"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
~[?:1.8.0_322]
at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_322]
at java.lang.Integer.parseInt(Integer.java:615) ~[?:1.8.0_322]
at
org.apache.flink.api.java.utils.AbstractParameterTool.getInt(AbstractParameterTool.java:120)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.streaming.examples.statemachine.StateMachineExample.main(StateMachineExample.java:113)
~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_322]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_322]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_322]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_322]
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
~[flink-dist_2.12-1.14.3.jar:1.14.3]
at
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:261)
~[flink-dist_2.12-1.14.3.jar:1.14.3] {code}
> Observer should support JobManager deployment crashed or deleted externally
> ---------------------------------------------------------------------------
>
> Key: FLINK-26473
> URL: https://issues.apache.org/jira/browse/FLINK-26473
> Project: Flink
> Issue Type: Sub-task
> Components: Kubernetes Operator
> Reporter: Yang Wang
> Assignee: Thomas Weise
> Priority: Major
>
> Follow the discussion in this PR
> [https://github.com/apache/flink-kubernetes-operator/pull/26#discussion_r817514763.]
>
> Currently, the {{observeJmDeployment}} still could not cover some scenarios,
> e.g. JobManager deployment crashed, JobManager deployment was deleted
> externally. When it {{JobManagerDeploymentStatus}} comes to {{{}READY{}}}, it
> will always be {{{}READY{}}}.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)