[jira] [Commented] (FLINK-20143) use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode

Kostas Kloudas (Jira) Fri, 13 Nov 2020 01:17:09 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-20143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231283#comment-17231283
 ]


Kostas Kloudas commented on FLINK-20143:
----------------------------------------

Also I think that your second command is not correct. You are using {{-t}} 
which activates the {{GenericCLI}} but then you specify parameters using the 
{{YarnSessionCLI}} convention of putting a {{-y}} as a prefix. Can you verify 
if the memory specifications you put are picked up?

> use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode
> --------------------------------------------------------------------------
>
>                 Key: FLINK-20143
>                 URL: https://issues.apache.org/jira/browse/FLINK-20143
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission, Deployment / YARN
>    Affects Versions: 1.12.0
>            Reporter: zhisheng
>            Priority: Major
>
> use follow command deploy flink job to yarn failed 
> {code:java}
> ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jar
> {code}
> log:
> {code:java}
> $ ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jar$ ./bin/flink run -m yarn-cluster 
> -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jarSLF4J: Class path contains 
> multiple SLF4J bindings.SLF4J: Found binding in 
> [jar:file:/data1/app/flink-1.12-SNAPSHOT/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/tools/lib/hadoop-aliyun-2.9.2-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]2020-11-13 16:14:30,347 INFO  
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Dynamic 
> Property set: 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib2020-11-13 
> 16:14:30,347 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli              
>   [] - Dynamic Property set: 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/libUsage with 
> built-in data generator: StateMachineExample [--error-rate 
> <probability-of-invalid-transition>] [--sleep <sleep-per-record-in-ms>]Usage 
> with Kafka: StateMachineExample --kafka-topic <topic> [--brokers 
> <brokers>]Options for both the above setups: [--backend <file|rocks>] 
> [--checkpoint-dir <filepath>] [--async-checkpoints <true|false>] 
> [--incremental-checkpoints <true|false>] [--output <filepath> OR null for 
> stdout]
> Using standalone source with error rate 0.000000 and sleep delay 1 millis
> 2020-11-13 16:14:30,706 WARN  
> org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The 
> configuration directory ('/data1/app/flink-1.12-SNAPSHOT/conf') already 
> contains a LOG4J config file.If you want to use logback, then please delete 
> or rename the log configuration file.2020-11-13 16:14:30,947 INFO  
> org.apache.hadoop.yarn.client.AHSProxy                       [] - Connecting 
> to Application History server at 
> FAT-hadoopuat-69117.vm.dc01.hellocloud.tech/10.69.1.17:102002020-11-13 
> 16:14:30,958 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
>   [] - No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar2020-11-13 
> 16:14:31,065 INFO  
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing 
> over to rm22020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - The 
> configured JobManager memory is 3072 MB. YARN will allocate 4096 MB to make 
> up an integer multiple of its minimum allocation memory (2048 MB, configured 
> via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be 
> used by Flink.2020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - The 
> configured TaskManager memory is 3072 MB. YARN will allocate 4096 MB to make 
> up an integer multiple of its minimum allocation memory (2048 MB, configured 
> via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be 
> used by Flink.2020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster 
> specification: ClusterSpecification{masterMemoryMB=3072, 
> taskManagerMemoryMB=3072, slotsPerTaskManager=2}2020-11-13 16:14:31,681 WARN  
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory      [] - The 
> short-circuit local reads feature cannot be used because libhadoop cannot be 
> loaded.2020-11-13 16:14:33,417 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting 
> application master application_1599741232083_219902020-11-13 16:14:33,446 
> INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - 
> Submitted application application_1599741232083_219902020-11-13 16:14:33,446 
> INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - 
> Waiting for the cluster to be allocated2020-11-13 16:14:33,448 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying 
> cluster, current state ACCEPTED
> ------------------------------------------------------------ The program 
> finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: Could not deploy Yarn job cluster. at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:330)
>  at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198)
>  at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
> at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:743) 
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:242) at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:971) at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047) 
> at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>  at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)Caused 
> by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not 
> deploy Yarn job cluster. at 
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:460)
>  at 
> org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70)
>  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1916)
>  at 
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:128)
>  at 
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)
>  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1798)
>  at 
> org.apache.flink.streaming.examples.statemachine.StateMachineExample.main(StateMachineExample.java:142)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:316)
>  ... 11 moreCaused by: 
> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN 
> application unexpectedly switched to state FAILED during 
> deployment.Diagnostics from YARN: Application application_1599741232083_21990 
> failed 2 times in previous 10000 milliseconds due to AM Container for 
> appattempt_1599741232083_21990_000002 exited with  exitCode: -1Failing this 
> attempt.Diagnostics: [2020-11-13 16:14:38.244]Destination must be relativeFor 
> more detailed output, check the application tracking page: 
> http://FAT-hadoopuat-69117.vm.dc01.hellocloud.tech:8188/applicationhistory/app/application_1599741232083_21990
>  Then click on links to logs of each attempt.. Failing the application.If log 
> aggregation is enabled on your cluster, use this command to further 
> investigate the issue:yarn logs -applicationId 
> application_1599741232083_21990 at 
> org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1078)
>  at 
> org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:558)
>  at 
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:453)
>  ... 22 more2020-11-13 16:14:38,492 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cancelling 
> deployment from Deployment Failure Hook2020-11-13 16:14:38,494 INFO  
> org.apache.hadoop.yarn.client.AHSProxy                       [] - Connecting 
> to Application History server at 
> FAT-hadoopuat-69117.vm.dc01.hellocloud.tech/10.69.1.17:102002020-11-13 
> 16:14:38,495 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
>   [] - Killing YARN application2020-11-13 16:14:38,499 INFO  
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing 
> over to rm22020-11-13 16:14:38,503 INFO  
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Killed 
> application application_1599741232083_219902020-11-13 16:14:38,503 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deleting 
> files in 
> hdfs://flashHadoopUAT/user/deploy/.flink/application_1599741232083_21990.
> {code}
> but if i set `execution.target: yarn-per-job` in flink-conf.yaml, it runs ok
> if i run in application mode, it runs ok too
> {code:java}
> ./bin/flink run-application -p 2 -d -t yarn-application -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib 
> ./examples/streaming/StateMachineExample.jar
> {code}
> but the jobid is 00000000000000000000000000000000
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20143) use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode

Reply via email to