[
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976256#comment-15976256
]
Ismaël Mejía commented on BEAM-2005:
------------------------------------
[~aviemzur] I think that this should be part of the Hadoop extensions and SDK
core should NOT depend on it for two reasons:
(1) We have done a big process of degooglification of the SDK so I don’t see a
strong reason to add a strong dependency to other group of libraries like the
ones that Hadoop will bring.
(2) The SDK should be object storage neutral, so I don’t think there is a
particular reason to add support out of the box for hdfs and don’t do it for s3
or other storage systems, specially since we can register those dynamically via
BeamFileSystemRegistrar once the dependency is added (like runners do). Note
that I expect that this will also be the case for Google Storage and that the
GCP dependencies won’t be needed as part of the core sdk for neutrality reasons
too.
However I agree with your feeling that for user experience having this support
out of the box would be nice, but we can cover this with better documentation
or with some starter (batteries included) maven poms e.g. one for people who
are full on GCP with all the google storage out of the box, one for people on
spark that can bring the hadoop extensions ready, etc.
> Add a Hadoop FileSystem implementation of Beam's FileSystem
> -----------------------------------------------------------
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-extensions
> Reporter: Stephen Sisk
> Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many
> different places.
> We should add a Hadoop FileSystem implementation
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
> - that would enable us to read from any file system that implements
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)