[ 
https://issues.apache.org/jira/browse/FLINK-26473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505920#comment-17505920
 ] 

Thomas Weise commented on FLINK-26473:
--------------------------------------

We should also try to identify the job submission error case as that is a 
frequent issue that requires good feedback to the user. At the moment the job 
submission completes without feedback and the pod goes into CrashLoopBackOff 
{code:java}
NAME                                           READY   STATUS             
RESTARTS   AGE
basic-checkpoint-ha-example-5bcc7f7f48-462c7   0/1     CrashLoopBackOff   9     
     24m {code}
with error message in the container log
{code:java}
java.util.concurrent.CompletionException: 
org.apache.flink.client.deployment.application.ApplicationExecutionException: 
Could not execute application.
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_322]
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_322]
        at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957) 
~[?:1.8.0_322]
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
 ~[?:1.8.0_322]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_322]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_322]
        at 
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:287)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$runApplicationAsync$2(ApplicationDispatcherBootstrap.java:224)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_322]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_322]
        at 
org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:171)
 ~[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
        at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
 ~[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
        at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$withContextClassLoader$0(ClassLoadingUtils.java:41)
 ~[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) 
[flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
 [flink-rpc-akka_58b910eb-0f31-4598-92f7-4a5dc70534bd.jar:1.14.3]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
[?:1.8.0_322]
        at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
[?:1.8.0_322]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
[?:1.8.0_322]
        at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) 
[?:1.8.0_322]
Caused by: 
org.apache.flink.client.deployment.application.ApplicationExecutionException: 
Could not execute application.
        ... 13 more
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main 
method caused an error: For input string: "junk2"
        at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:261)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3]
        ... 12 more
Caused by: java.lang.NumberFormatException: For input string: "junk2"
        at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
~[?:1.8.0_322]
        at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_322]
        at java.lang.Integer.parseInt(Integer.java:615) ~[?:1.8.0_322]
        at 
org.apache.flink.api.java.utils.AbstractParameterTool.getInt(AbstractParameterTool.java:120)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
org.apache.flink.streaming.examples.statemachine.StateMachineExample.main(StateMachineExample.java:113)
 ~[?:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_322]
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_322]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_322]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_322]
        at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
~[flink-dist_2.12-1.14.3.jar:1.14.3]
        at 
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:261)
 ~[flink-dist_2.12-1.14.3.jar:1.14.3] {code}

> Observer should support JobManager deployment crashed or deleted externally
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-26473
>                 URL: https://issues.apache.org/jira/browse/FLINK-26473
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Kubernetes Operator
>            Reporter: Yang Wang
>            Assignee: Thomas Weise
>            Priority: Major
>
> Follow the discussion in this PR 
> [https://github.com/apache/flink-kubernetes-operator/pull/26#discussion_r817514763.]
>  
> Currently, the {{observeJmDeployment}} still could not cover some scenarios, 
> e.g. JobManager deployment crashed, JobManager deployment was deleted 
> externally. When it {{JobManagerDeploymentStatus}} comes to {{{}READY{}}}, it 
> will always be {{{}READY{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to