chenqizhu created FLINK-25099: --------------------------------- Summary: flink on yarn Accessing two HDFS Clusters Key: FLINK-25099 URL: https://issues.apache.org/jira/browse/FLINK-25099 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.13.3 Environment: flink : 1.13.3 hadoop : 3.3.0 Reporter: chenqizhu Fix For: 1.13.3
Flink version 1.13 supports configuration of Hadoop properties in flink-conf.yaml via flink.hadoop.*. There is A requirement to write checkpoint to HDFS with SSDS (called cluster B) to speed checkpoint writing, but this HDFS cluster is not the default HDFS in the flink client (called cluster A by default). Yaml is configured with nameservices for cluster A and cluster B, which is similar to HDFS federated mode. The configuration is as follows: {code:java} flink.hadoop.dfs.nameservices: ACluster,BCluster flink.hadoop.fs.defaultFS: hdfs://BCluster flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2 flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.xxxx:9000 flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.xxxx:50070 flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xxxxxx:9000 flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xxxxxx:50070 flink.hadoop.dfs.client.failover.proxy.provider.ACluster: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2 flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xxxxxx:9000 flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xxxxxx:50070 flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xxxxxx:9000 flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.xxxxx:50070 flink.hadoop.dfs.client.failover.proxy.provider.BCluster: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider {code} However, an error occurred during the startup of the job, which is reported as follows: (change configuration items to A flink local client default HDFS cluster, the operation can be normal boot: flink.hadoop.fs.DefaultFS: hdfs: / / ACluster) {noformat} Caused by: BCluster java.net.UnknownHostException: BCluster at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448) at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:374) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:308) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745){noformat} Is there a solution to the above problems? The pain point is that Flink can access two HDFS clusters, preferably through the configuration of Flink-conf. yaml. -- This message was sent by Atlassian Jira (v8.20.1#820001)