[ 
https://issues.apache.org/jira/browse/BEAM-9315?focusedWorklogId=387420&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387420
 ]

ASF GitHub Bot logged work on BEAM-9315:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Feb/20 15:10
            Start Date: 14/Feb/20 15:10
    Worklog Time Spent: 10m 
      Work Description: RyanSkraba commented on pull request #10866: 
[BEAM-9315] Read HADOOP_CONF_DIR and YARN_CONF_DIR with multi paths
URL: https://github.com/apache/beam/pull/10866#discussion_r379479831
 
 

 ##########
 File path: 
sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptionsTest.java
 ##########
 @@ -159,6 +239,43 @@ public void 
testDefaultSetYarnConfDirAndHadoopConfDirNotSameConfiguration() thro
     assertThat(configurationList.get(1 - hadoopConfIndex).get("propertyD"), 
Matchers.equalTo("D"));
   }
 
+  @Test
 
 Review comment:
   This last test is theoretically correct, but doesn't add much value (there 
can only be one Configuration today).  I'd get rid of it.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 387420)
    Time Spent: 0.5h  (was: 20m)

> HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple 
> paths
> -------------------------------------------------------------------------------
>
>                 Key: BEAM-9315
>                 URL: https://issues.apache.org/jira/browse/BEAM-9315
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-hadoop-file-system
>    Affects Versions: 2.19.0
>         Environment: Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11)
>            Reporter: Claudio Venturini
>            Assignee: Claudio Venturini
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In certain Hadoop deployments the {{HADOOP_CONF_DIR}} environment variable 
> could contain multiple paths. For example, when running {{spark-submit}} 
> Cloudera 6.3 sets it as follows:
> {{HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf}}
> Currently the class {{HadoopFileSystemOptions}} reads the content of the 
> variable but treats it as a single path. When it contains multiple paths, 
> this makes Beam unable to properly configure Hadoop, and so HDFS can't be 
> accessed. At the moment, the only work arounds to make it work that I'm aware 
> of are:
>  - Override the {{HADOOP_CONF_DIR}} set by Cloudera for the Spark service, 
> but I think it could cause problems with some other tools (maybe when using 
> Hive from Spark, because I think that Spark wouldn't be able to find Hive 
> config)
>  - Pass HDFS configurations using the {{--hdfsConfigurations}} options, but 
> it's inconvenient when there are a lot of config to set, and they would not 
> be changed automatically when reconfigured in Cloudera Manager
> In my opinion, to fix this the {{HadoopFileSystemOptions}} class should split 
> the content of the {{HADOOP_CONF_DIR}} environment variable by colon (":") to 
> detect all paths contained.
> I have already fixed this and all tests on class {{HadoopFileSystemOptions}} 
> pass successfully. I'm preparing a pull request.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to