[
https://issues.apache.org/jira/browse/BEAM-7613?focusedWorklogId=264759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264759
]
ASF GitHub Bot logged work on BEAM-7613:
----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Jun/19 16:37
Start Date: 21/Jun/19 16:37
Worklog Time Spent: 10m
Work Description: lukecwik commented on pull request #8923: [BEAM-7613]
HadoopFileSystem can work with more than one cluster.
URL: https://github.com/apache/beam/pull/8923#discussion_r296309586
##########
File path:
sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java
##########
@@ -313,7 +328,7 @@ protected HadoopResourceId matchNewResource(String
singleResourceSpec, boolean i
@Override
protected String getScheme() {
- return fileSystem.getScheme();
+ return "hdfs";
Review comment:
I don't think we need to provide an `AuthorityAwareFileSystem` in Beam. As
long as the HadoopFileSystem knows how to handle multiple authorities by
selecting the appropriate configuration for it, we are good.
For the other schemes that HDFS currently supports such as webhdfs, ftp,
..., they should all be registered as separate HadoopFileSystem instances.
If Hadoop starts to support arbitrary schemes, the hdfs1, hdfs2, ... this
will allow users to access the same authority but with different settings which
is why I agree with failing fast when multiple configurations provide the same
scheme+authority.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 264759)
Time Spent: 1h (was: 50m)
> HadoopFileSystem can be only used with fs.defaultFS
> ---------------------------------------------------
>
> Key: BEAM-7613
> URL: https://issues.apache.org/jira/browse/BEAM-7613
> Project: Beam
> Issue Type: Bug
> Components: io-java-hadoop-file-system
> Affects Versions: 2.13.0
> Reporter: David Moravek
> Assignee: David Moravek
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> _HadoopFileSystem_ creates underlying _FileSystem_ (one from
> org.apache.hadoop) instance during it's construction. Single _FileSystem_
> instance is tied to a particular cluster (scheme + authority pair). In case
> we want to talk to another cluster, this fail due to _FileSystem#checkPath_.
>
> This can be fixed by using _FileSystem#get(java.net.URI,
> org.apache.hadoop.conf.Configuration)_ instead of
> _FileSystem#newInstance(org.apache.hadoop.conf.Configuration)_{{}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)