Thanks JB for the feedback. Yes, we should provide a hadoop.fs.FileSystem adaptor. As you said, it will make a range of file system available in Beam.
And, people can choose to implement BeamFileSystem directly to get the best performance (For example, providing bulk operations.) -- Pei On Tue, Nov 29, 2016 at 11:11 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Pei, > > rethinking about that, I understand that the purpose of the Beam > filesystem is to avoid to bring a bunch of dependencies into the core. That > makes perfect sense. > > So, I agree that a Beam filesystem abstract is fine. > > My point is that we should provide a HadoopFilesystem extension/plugin for > Beam filesystem asap: that would help us to support a good range of > filesystems quickly. > > Just my $0.01 ;) > > Regards > JB > > > On 11/17/2016 08:18 PM, Pei He wrote: > >> Hi JB, >> My proposals are based on the current IOChannelFactory, and how they are >> used in FileBasedSink. >> >> Let's me spend more time to investigate Hadoop FileSystem interface. >> -- >> Pei >> >> On Thu, Nov 17, 2016 at 1:21 AM, Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >> By the way, Pei, for the record: why introducing BeamFileSystem and not >>> using the Hadoop FileSystem interface ? >>> >>> Thanks >>> Regards >>> JB >>> >>> On 11/17/2016 01:09 AM, Pei He wrote: >>> >>> Hi, >>>> >>>> I am working on BEAM-59 >>>> <https://issues.apache.org/jira/browse/BEAM-59> "IOChannelFactory >>>> redesign". The goals are: >>>> >>>> 1. Support file-based IOs (TextIO, AvorIO) with user-defined file >>>> system. >>>> >>>> 2. Support configuring any user-defined file system. >>>> >>>> And, I drafted the design proposal in two parts to address them in >>>> order: >>>> >>>> Part 1: IOChannelFactory Redesign >>>> <https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJ >>>> sVG3qel2lhdKTknmZ_7M/edit#> >>>> >>>> Summary: >>>> >>>> Old API: WritableByteChannel create(String spec, String mimeType); >>>> >>>> New API: WritableByteChannel create(URI uri, CreateOptions options); >>>> >>>> Noticeable proposed changes: >>>> >>>> >>>> 1. >>>> >>>> Includes the options parameter in most methods to specify behaviors. >>>> 2. >>>> >>>> Replace String with URI to include scheme for files/directories >>>> locations. >>>> 3. >>>> >>>> Require file systems to provide a SeekableByteChannel for read. >>>> 4. >>>> >>>> Additional methods, such as getMetadata(), rename() e.t.c >>>> >>>> >>>> Part 2: Configurable BeamFileSystem >>>> <https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4 >>>> q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs> >>>> >>>> Summary: >>>> >>>> Old API: IOChannelUtils.getFactory(glob).match(glob); >>>> >>>> New API: BeamFileSystems.getFileSystem(glob, config).match(glob); >>>> >>>> >>>> Looking for comments and feedback. >>>> >>>> Thanks >>>> >>>> -- >>>> >>>> Pei >>>> >>>> >>>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >>> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >