Aviem Zur commented on BEAM-2005:

Yes, it makes sense that the code would be in an extension and a BoM/archetypes 
+ good documentation will help users get up and running.

However I still think the case I mentioned will happen in practice:
bq. a user creates a project from scratch, adds a dependency on a runner (say 
direct runner), uses TextIO to do a word count and it works for them when 
passing "file://path/to/file", changing this to "hdfs://path/to/file" will not 

So in this case the user will have to resort to looking up documentation on how 
to achieve what they wanted.

What we could do, if we don't want to have {{core}} bloated with dependencies 
on all filesystems out of the box is at least have a {{scheme}} -> {{module}} 
mapping which can be used to display an informative error message such as:
bq. To enable HDFS support add a dependency on sdk-java-extensions-hadoop"
And a similar message for the other filesystem schemes which we have support 
for in our extension modules.
This could be achieved by a static {{Map<String, String>}} in {{core}}.

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> -----------------------------------------------------------
>                 Key: BEAM-2005
>                 URL: https://issues.apache.org/jira/browse/BEAM-2005
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.

This message was sent by Atlassian JIRA

Reply via email to