[ https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976952#comment-15976952 ]
Stephen Sisk commented on BEAM-2005: ------------------------------------ I don't want to derail this conversation, but I did have a couple other concerns - Beam's FileSystem has a copy() command, however I can't find a good analog in Hadoop's FileSystem. https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html shows lots of copy to/from local files, but no "copy between these two arbitrary paths". I also believe that since Beam FileSystem objects are configured via PipelineOptions, we need to pass a Hadoop Configuration through PipelineOptions. I think that's very solvable, but it does seem semi-complicated. I'm going to open subtasks for discussion so we can discuss in separate threads. > Add a Hadoop FileSystem implementation of Beam's FileSystem > ----------------------------------------------------------- > > Key: BEAM-2005 > URL: https://issues.apache.org/jira/browse/BEAM-2005 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions > Reporter: Stephen Sisk > Assignee: Stephen Sisk > Fix For: First stable release > > > Beam's FileSystem creates an abstraction for reading from files in many > different places. > We should add a Hadoop FileSystem implementation > (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html) > - that would enable us to read from any file system that implements > FileSystem (including HDFS, azure, s3, etc..) > I'm investigating this now. -- This message was sent by Atlassian JIRA (v6.3.15#6346)