[
https://issues.apache.org/jira/browse/BEAM-9315?focusedWorklogId=387697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387697
]
ASF GitHub Bot logged work on BEAM-9315:
----------------------------------------
Author: ASF GitHub Bot
Created on: 14/Feb/20 21:44
Start Date: 14/Feb/20 21:44
Worklog Time Spent: 10m
Work Description: iemejia commented on issue #10866: [BEAM-9315] Allow
multiple paths via HADOOP_CONF_DIR in HadoopFileSystemOptions
URL: https://github.com/apache/beam/pull/10866#issuecomment-586484647
Merged manually to squash the commits and fix the commit title. Thanks for
bringing this one Claudio (@ventuc) !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 387697)
Time Spent: 1.5h (was: 1h 20m)
> HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple
> paths
> -------------------------------------------------------------------------------
>
> Key: BEAM-9315
> URL: https://issues.apache.org/jira/browse/BEAM-9315
> Project: Beam
> Issue Type: Improvement
> Components: io-java-hadoop-file-system
> Affects Versions: 2.19.0
> Environment: Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11)
> Reporter: Claudio Venturini
> Assignee: Claudio Venturini
> Priority: Major
> Fix For: 2.20.0
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> In certain Hadoop deployments the {{HADOOP_CONF_DIR}} environment variable
> could contain multiple paths. For example, when running {{spark-submit}}
> Cloudera 6.3 sets it as follows:
> {{HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf}}
> Currently the class {{HadoopFileSystemOptions}} reads the content of the
> variable but treats it as a single path. When it contains multiple paths,
> this makes Beam unable to properly configure Hadoop, and so HDFS can't be
> accessed. At the moment, the only work arounds to make it work that I'm aware
> of are:
> - Override the {{HADOOP_CONF_DIR}} set by Cloudera for the Spark service,
> but I think it could cause problems with some other tools (maybe when using
> Hive from Spark, because I think that Spark wouldn't be able to find Hive
> config)
> - Pass HDFS configurations using the {{--hdfsConfigurations}} options, but
> it's inconvenient when there are a lot of config to set, and they would not
> be changed automatically when reconfigured in Cloudera Manager
> In my opinion, to fix this the {{HadoopFileSystemOptions}} class should split
> the content of the {{HADOOP_CONF_DIR}} environment variable by colon (":") to
> detect all paths contained.
> I have already fixed this and all tests on class {{HadoopFileSystemOptions}}
> pass successfully. I'm preparing a pull request.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)