[ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977497#comment-15977497
 ] 

Stephen Sisk commented on BEAM-2005:
------------------------------------

Some additional questions that I think are related to the registration question 
we're talking about here - 

As discussed above, Hadoop FileSystem can be used to access multiple types of 
filesystems (s3/hdfs/etc...)

1) However, FileSystemRegistrar only allows 1 schema to be registered per 
FileSystemRegistrar. That means the single class can only handle one schema.  
We could either change the interface to allow registering multiple schema, or 
create multiple classes that all inherit from a base class and declare a 
separate schema. (eg s3HadoopFileSystem, HdfsHadoopFileSystem, etc...)

2) Additionally, Hadoop filesystems are configured via Configuration objects 
(eg, the options discussed here: 
https://issues.apache.org/jira/browse/HADOOP-10400 for S3) - that means that a 
user might/probably should be able to configure those options and have multiple 
connections per schema type (ie,  "I want to connect to two different HDFS 
instances") Looking at how the Beam FileSystem is currently implemented, it's 
not clear to me that it is possible today to handle this scenario.

This 2nd question shouldn't block having a simple "I can read from one hdfs 
instance" case working, but it does seem important in the long run.

cc [~davor] [[email protected]]

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> -----------------------------------------------------------
>
>                 Key: BEAM-2005
>                 URL: https://issues.apache.org/jira/browse/BEAM-2005
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to