Can you please elaborare on what has solved your issue, so others that run into it could learn from your experience?
Best, D. On Wed 1. 12. 2021 at 3:38, chenqizhu <qizhu...@163.com> wrote: > Hi, > > My problem has been solved. Thank you again > > Best regards > > 在 2021-12-01 09:58:52,"chenqizhu" <qizhu...@163.com> 写道: > > Hi David, > > I'm glad you can reply. > > --this exception doesn't seem to come from Flink, but rather from a > YARN container bootstrap. > --In this case the exception happens before any Flink code is executed > by the NodeManager. > > If that's the case, but how nodeManager knows about the 'BCluster' I > configured in Flink in this case ? > > > In short, there are now two HDFS and I want to access one of > them(called BCluster), which is the cluster not the default of the FLINK > client . (The YARN node contains all nodes of the two HDFS . ) > > > There are more details in JIRA FLINK-25099 > <https://issues.apache.org/jira/browse/FLINK-25099> > > > > At 2021-11-30 21:50:08, "David Morávek" <d...@apache.org> wrote: > > Hi chenqizhu, > > When YARN container starts up, it needs to download resources from HDFS > (your job archives / configuration / distributed cache / ...) which are > necessary for startup of the user application (in Flink case JobManager / > TaskManager). As far as I can tell, the affected NodeManager tries to pull > data from a filesystem it doesn't have access to (refer to hdfs-site.conf / > yarn logs on the particular node). > > question : Why cannot flink-conf(flink.hadoop.*) overwrite the >> configurations read by YARN NodeManager ? >> > > In this case the exception happens before any Flink code is executed by > the NodeManager. > > I think NM logs can help you identify which files are not accessible by > YARN, that could narrow it down a bit. > > Best, > D. > > On Tue, Nov 30, 2021 at 9:23 AM chenqizhu <qizhu...@163.com> wrote: > >> hi, >> Flink version 1.13 supports configuration of Hadoop properties in >> flink-conf.yaml via flink.hadoop.*. There is A requirement to write >> checkpoint to HDFS with SSDS (called Bcluster ) to speed checkpoint >> writing, but this HDFS cluster is not the default HDFS in the flink client >> (called Acluster ). Yaml is configured with nameservices for cluster A and >> cluster B, which is similar to HDFS federated mode. >> >> The configuration is as follows: >> >> flink.hadoop.dfs.nameservices: ACluster,BCluster >> flink.hadoop.fs.defaultFS: hdfs://BCluster >> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 >> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.xxxx:9000 >> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.xxxx:50070 >> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xxxxxx:9000 >> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xxxxxx:50070 >> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: >> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider >> >> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 >> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xxxxxx:9000 >> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xxxxxx:50070 >> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xxxxxx:9000 >> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.xxxxx:50070 >> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: >> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider >> >> However, an error occurred during the startup of the job, which is >> reported as follows: >> >> (change configuration items to A flink local client default HDFS cluster, >> the operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / >> ACluster) >> >> >> Failing this attempt.Diagnostics: [2021-11-30 >> 15:39:15.582]java.net.UnknownHostException: BCluster >> >> java.lang.IllegalArgumentException: java.net.UnknownHostException: BCluster >> >> at >> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) >> at >> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) >> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:374) >> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:308) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) >> at >> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) >> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) >> at >> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) >> at >> org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) >> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) >> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) >> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) >> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.net.UnknownHostException: BCluster >> >> ... 28 more >> >> Caused by: BCluster >> java.net.UnknownHostException: BCluster >> at >> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) >> at >> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) >> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:374) >> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:308) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) >> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) >> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) >> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) >> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) >> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) >> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) >> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) >> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> question : Why cannot flink-conf(flink.hadoop.*) overwrite the >> configurations read by YARN NodeManager ? >> Is there a solution to the above problems? The pain point is that Flink >> can access two HDFS clusters, preferably through the configuration of >> Flink-conf. yaml. >> The attachments is the client log file. >> >> Best regards >> >> >> >> > > > > > > > > >