FWIW I'm pretty sure this <https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/util-hadoop> is Google's gs hdfs connector, and I think this <https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.6.0> should work for s3, and Azure's is here <https://hadoop.apache.org/docs/stable2/hadoop-azure/index.html>. So going with Hadoop's FileSystem interface is already compatible with hdfs, gs, s3, azure.
On Thu, Nov 17, 2016 at 9:19 PM Pei He <[email protected]> wrote: > Hi JB, > My proposals are based on the current IOChannelFactory, and how they are > used in FileBasedSink. > > Let's me spend more time to investigate Hadoop FileSystem interface. > -- > Pei > > On Thu, Nov 17, 2016 at 1:21 AM, Jean-Baptiste Onofré <[email protected]> > wrote: > > > By the way, Pei, for the record: why introducing BeamFileSystem and not > > using the Hadoop FileSystem interface ? > > > > Thanks > > Regards > > JB > > > > On 11/17/2016 01:09 AM, Pei He wrote: > > > >> Hi, > >> > >> I am working on BEAM-59 > >> <https://issues.apache.org/jira/browse/BEAM-59> "IOChannelFactory > >> redesign". The goals are: > >> > >> 1. Support file-based IOs (TextIO, AvorIO) with user-defined file > system. > >> > >> 2. Support configuring any user-defined file system. > >> > >> And, I drafted the design proposal in two parts to address them in > order: > >> > >> Part 1: IOChannelFactory Redesign > >> <https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJ > >> sVG3qel2lhdKTknmZ_7M/edit#> > >> > >> Summary: > >> > >> Old API: WritableByteChannel create(String spec, String mimeType); > >> > >> New API: WritableByteChannel create(URI uri, CreateOptions options); > >> > >> Noticeable proposed changes: > >> > >> > >> 1. > >> > >> Includes the options parameter in most methods to specify behaviors. > >> 2. > >> > >> Replace String with URI to include scheme for files/directories > >> locations. > >> 3. > >> > >> Require file systems to provide a SeekableByteChannel for read. > >> 4. > >> > >> Additional methods, such as getMetadata(), rename() e.t.c > >> > >> > >> Part 2: Configurable BeamFileSystem > >> <https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4 > >> q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs> > >> > >> Summary: > >> > >> Old API: IOChannelUtils.getFactory(glob).match(glob); > >> > >> New API: BeamFileSystems.getFileSystem(glob, config).match(glob); > >> > >> > >> Looking for comments and feedback. > >> > >> Thanks > >> > >> -- > >> > >> Pei > >> > >> > > -- > > Jean-Baptiste Onofré > > [email protected] > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >
