[jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

JIRA Thu, 20 Apr 2017 00:43:15 -0700

    [ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976256#comment-15976256
 ]


Ismaël Mejía commented on BEAM-2005:
------------------------------------

[~aviemzur] I think that this should be part of the Hadoop extensions and SDK 
core should NOT depend on it for two reasons:

(1) We have done a big process of degooglification of the SDK so I don’t see a 
strong reason to add a strong dependency to other group of libraries like the 
ones that Hadoop will bring.

(2) The SDK should be object storage neutral, so I don’t think there is a 
particular reason to add support out of the box for hdfs and don’t do it for s3 
or other storage systems, specially since we can register those dynamically via 
BeamFileSystemRegistrar once the dependency is added (like runners do). Note 
that I expect that this will also be the case for Google Storage and that the 
GCP dependencies won’t be needed as part of the core sdk for neutrality reasons 
too.

However I agree with your feeling that for user experience having this support 
out of the box would be nice, but we can cover this with better documentation 
or with some starter (batteries included) maven poms e.g. one for people who 
are full on GCP with all the google storage out of the box, one for people on 
spark that can bring the hadoop extensions ready, etc.


> Add a Hadoop FileSystem implementation of Beam's FileSystem
> -----------------------------------------------------------
>
>                 Key: BEAM-2005
>                 URL: https://issues.apache.org/jira/browse/BEAM-2005
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

Reply via email to