Can you please elaborare on what has solved your issue, so others that run
into it could learn from your experience?

Best,
D.

On Wed 1. 12. 2021 at 3:38, chenqizhu <qizhu...@163.com> wrote:

> Hi,
>
>   My problem has been solved. Thank you again
>
> Best regards
>
> 在 2021-12-01 09:58:52,"chenqizhu" <qizhu...@163.com> 写道:
>
> Hi David,
>
>    I'm glad you can reply.
>
>    --this exception doesn't seem to come from Flink, but rather from a
> YARN container bootstrap.
>    --In this case the exception happens before any Flink code is executed
> by the NodeManager.
>
>     If that's the case, but how nodeManager knows about the 'BCluster' I
> configured in Flink in this case ?
>
>
>     In short, there are now two HDFS and I want to access one of
> them(called BCluster), which is the cluster not the default of the FLINK
> client . (The YARN node contains all nodes of the two HDFS . )
>
>
>    There are more details in JIRA FLINK-25099
> <https://issues.apache.org/jira/browse/FLINK-25099>
>
>
>
> At 2021-11-30 21:50:08, "David Morávek" <d...@apache.org> wrote:
>
> Hi chenqizhu,
>
> When YARN container starts up, it needs to download resources from HDFS
> (your job archives / configuration / distributed cache / ...) which are
> necessary for startup of the user application (in Flink case JobManager /
> TaskManager). As far as I can tell, the affected NodeManager tries to pull
> data from a filesystem it doesn't have access to (refer to hdfs-site.conf /
> yarn logs on the particular node).
>
> question : Why cannot flink-conf(flink.hadoop.*) overwrite the
>> configurations read by YARN NodeManager ?
>>
>
> In this case the exception happens before any Flink code is executed by
> the NodeManager.
>
> I think NM logs can help you identify which files are not accessible by
> YARN, that could narrow it down a bit.
>
> Best,
> D.
>
> On Tue, Nov 30, 2021 at 9:23 AM chenqizhu <qizhu...@163.com> wrote:
>
>> hi,
>>     Flink version 1.13 supports configuration of Hadoop properties in
>> flink-conf.yaml via flink.hadoop.*. There is A requirement to write
>> checkpoint to HDFS with SSDS (called Bcluster ) to speed checkpoint
>> writing, but this HDFS cluster is not the default HDFS in the flink client
>> (called Acluster ). Yaml is configured with nameservices for cluster A and
>> cluster B, which is similar to HDFS federated mode.
>>
>> The configuration is as follows:
>>
>> flink.hadoop.dfs.nameservices: ACluster,BCluster
>> flink.hadoop.fs.defaultFS: hdfs://BCluster
>> flink.hadoop.dfs.ha.namenodes.ACluster: nn1,nn2
>> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn1: 10.xxxx:9000
>> flink.hadoop.dfs.namenode.http-address.ACluster.nn1: 10.xxxx:50070
>> flink.hadoop.dfs.namenode.rpc-address.ACluster.nn2: 10.xxxxxx:9000
>> flink.hadoop.dfs.namenode.http-address.ACluster.nn2: 10.xxxxxx:50070
>> flink.hadoop.dfs.client.failover.proxy.provider.ACluster: 
>> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
>>
>> flink.hadoop.dfs.ha.namenodes.BCluster: nn1,nn2
>> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn1: 10.xxxxxx:9000
>> flink.hadoop.dfs.namenode.http-address.BCluster.nn1: 10.xxxxxx:50070
>> flink.hadoop.dfs.namenode.rpc-address.BCluster.nn2: 10.xxxxxx:9000
>> flink.hadoop.dfs.namenode.http-address.BCluster.nn2: 10.xxxxx:50070
>> flink.hadoop.dfs.client.failover.proxy.provider.BCluster: 
>> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
>>
>> However, an error occurred during the startup of the job, which is
>> reported as follows:
>>
>> (change configuration items to A flink local client default HDFS cluster,
>> the operation can be normal boot:  flink.hadoop.fs.DefaultFS: hdfs: / /
>> ACluster)
>>
>>
>> Failing this attempt.Diagnostics: [2021-11-30 
>> 15:39:15.582]java.net.UnknownHostException: BCluster
>>
>> java.lang.IllegalArgumentException: java.net.UnknownHostException: BCluster
>>
>>      at 
>> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
>>      at 
>> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
>>      at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:374)
>>      at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:308)
>>      at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
>>      at 
>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
>>      at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
>>      at 
>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
>>      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
>>      at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
>>      at 
>> org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
>>      at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
>>      at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
>>      at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:422)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
>>      at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
>>      at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
>>      at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
>>      at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
>>      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>      at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>      at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.net.UnknownHostException: BCluster
>>
>>      ... 28 more
>>
>> Caused by: BCluster
>> java.net.UnknownHostException: BCluster
>> at 
>> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:448)
>> at 
>> org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
>> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:374)
>> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:308)
>> at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:184)
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
>> at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:270)
>> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
>> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
>> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
>> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
>> at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
>> at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
>> at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> question : Why cannot flink-conf(flink.hadoop.*) overwrite the
>> configurations read by YARN NodeManager ?
>> Is there a solution to the above problems? The pain point is that Flink
>> can access two HDFS clusters, preferably through the configuration of
>> Flink-conf. yaml.
>> The attachments is the client log file.
>>
>> Best regards
>>
>>
>>
>>
>
>
>
>
>
>
>
>
>

Reply via email to