Hi Beam,

I am new to beam spark and recently got an error:


Caused by: java.lang.IllegalArgumentException: The
HadoopFileSystemRegistrar currently only supports at most a single
Hadoop configuration.

at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
~[beam-vendor-guava-26_0-jre-0.1.jar:?]
        at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystemRegistrar.fromOptions(HadoopFileSystemRegistrar.java:60)
~[beam-sdks-java-io-hadoop-file-system-3.2250.5.jar:?]
        at 
org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:496)
~[beam-sdks-java-core-3.2250.5.jar:?]
        at 
org.apache.beam.sdk.io.FileSystems.setDefaultPipelineOptions(FileSystems.java:486)
~[beam-sdks-java-core-3.2250.5.jar:?]
        at 
org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:47)
~[beam-sdks-java-core-3.2250.5.jar:?]
        at org.apache.beam.sdk.Pipeline.create(Pipeline.java:149)
~[beam-sdks-java-core-3.2250.5.jar:?]


I tried to debug and printed some logs using

 List<Configuration> configurations =
pipelineOpts.as(HadoopFileSystemOptions.class).getHdfsConfiguration();
LOG.info("print hdfsConfiguration for testing: " +
configurations.toString());


2020-11-19 18:02:26.289 [main] HelloBeam [INFO] print
hdfsConfiguration for testing:

[Configuration:
/export/content/lid/apps/samza-yarn-nodemanager/1d5c39c31bb33e3dd8e8149168167870328a014b/genConfig/core-site.xml,

 Configuration:
/export/content/lid/apps/samza-yarn-nodemanager/1d5c39c31bb33e3dd8e8149168167870328a014b/genConfig/core-site.xml]


as you can see the hdfsConfiguration is a list and contains two same
elements, which caused the error.

I noticed that the configurations are generated according to
HADOOP_CONF_DIR and YARN_CONF_DIR. In the class, a set is used to
dedup,

however, in my test environment, the two dirs are:


HADOOP_CONF_DIR=/export/content/lid/apps/samza-yarn-nodemanager/1d5c39c31bb33e3dd8e8149168167870328a014b/bin/../genConfig*/*

YARN_CONF_DIR=/export/content/lid/apps/samza-yarn-nodemanager/1d5c39c31bb33e3dd8e8149168167870328a014b/bin/../genConfig


HADOOP_CONF_DIR contains a '/' at the end so these two dir are
considered to be different and then got added twice.


I am not sure this is what we expected or is it a bug we should fix?


Thanks in advance. Hope can hear from you soon.


Best,

Yuhong

Reply via email to