[
https://issues.apache.org/jira/browse/FLINK-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zili Chen updated FLINK-10435:
------------------------------
Affects Version/s: 1.9.1
> Client sporadically hangs after Ctrl + C
> ----------------------------------------
>
> Key: FLINK-10435
> URL: https://issues.apache.org/jira/browse/FLINK-10435
> Project: Flink
> Issue Type: Bug
> Components: Command Line Client, Deployment / YARN
> Affects Versions: 1.5.5, 1.6.2, 1.7.0, 1.9.1
> Reporter: Gary Yao
> Priority: Major
> Fix For: 1.6.5, 1.7.3
>
>
> When submitting a YARN job cluster in attached mode, the client hangs
> indefinitely if Ctrl + C is pressed at the right time. One can recover from
> this by sending SIGKILL.
> *Command to submit job*
> {code}
> HADOOP_CLASSPATH=`hadoop classpath` bin/flink run -m yarn-cluster
> examples/streaming/WordCount.jar
> {code}
>
> *Output/Stacktrace*
> {code}
> [hadoop@ip-172-31-45-22 flink-1.5.4]$ HADOOP_CLASSPATH=`hadoop classpath`
> bin/flink run -m yarn-cluster examples/streaming/WordCount.jar
> Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/hadoop/flink-1.5.4/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2018-09-26 12:01:04,241 INFO org.apache.hadoop.yarn.client.RMProxy
> - Connecting to ResourceManager at
> ip-172-31-45-22.eu-central-1.compute.internal/172.31.45.22:8032
> 2018-09-26 12:01:04,386 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
> - No path for the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2018-09-26 12:01:04,386 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
> - No path for the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2018-09-26 12:01:04,402 WARN
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the
> HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink
> YARN Client needs one of these to be set to properly load the Hadoop
> configuration for accessing YARN.
> 2018-09-26 12:01:04,598 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster
> specification: ClusterSpecification{masterMemoryMB=1024,
> taskManagerMemoryMB=1024, numberTaskManagers=1, slotsPerTaskManager=1}
> 2018-09-26 12:01:04,972 WARN
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - The
> configuration directory ('/home/hadoop/flink-1.5.4/conf') contains both LOG4J
> and Logback configuration files. Please delete or rename one of them.
> 2018-09-26 12:01:07,857 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting
> application master application_1537944258063_0017
> 2018-09-26 12:01:07,913 INFO
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
> application application_1537944258063_0017
> 2018-09-26 12:01:07,913 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for
> the cluster to be allocated
> 2018-09-26 12:01:07,916 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying
> cluster, current state ACCEPTED
> ^C2018-09-26 12:01:08,851 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cancelling
> deployment from Deployment Failure Hook
> 2018-09-26 12:01:08,854 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Killing YARN
> application
> ------------------------------------------------------------
> The program finished with the following exception:
> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't
> deploy Yarn session cluster
> at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:410)
> at
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:258)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
> at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
> at
> org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
> Caused by:
> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
> The YARN application unexpectedly switched to state KILLED during deployment.
> Diagnostics from YARN: Application application_1537944258063_0017 was killed
> by user hadoop at 172.31.45.22
> If log aggregation is enabled on your cluster, use this command to further
> investigate the issue:
> yarn logs -applicationId application_1537944258063_0017
> at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1059)
> at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:532)
> at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:403)
> ... 9 more
> 2018-09-26 12:01:09,065 INFO
> org.apache.hadoop.io.retry.RetryInvocationHandler - Exception
> while invoking ApplicationClientProtocolPBClientImpl.forceKillApplication
> over null. Retrying after sleeping for 30000ms.
> java.io.IOException: The client is stopped
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
> at org.apache.hadoop.ipc.Client.call(Client.java:1381)
> at org.apache.hadoop.ipc.Client.call(Client.java:1345)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy8.forceKillApplication(Unknown Source)
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:213)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
> at com.sun.proxy.$Proxy9.forceKillApplication(Unknown Source)
> at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:439)
> at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:419)
> at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.failSessionDuringDeployment(AbstractYarnClusterDescriptor.java:1236)
> at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.access$200(AbstractYarnClusterDescriptor.java:111)
> at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor$DeploymentFailureHook.run(AbstractYarnClusterDescriptor.java:1493)
> {code}
> *Expected behavior*
> Client should shutdown the YARN cluster and exit.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)