Hi Pei,
Reading the documents, for the part 1, I think that using Hadoop filesystem:
https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/fs/FileSystem.html
would make more sense than introducing the BeamFileSystem interface.
It would allow us to directly support HDFS, FTP, Azure, S3 out of the
box (as Hadoop FileSystem provide sub-classes for those providers).
We could provide a GsFileSystem as sub-class of Hadoop Filesystem.
The part 2 is OK in term of configuration.
Let me know if I can work with you on this (in term of implementation).
Regards
JB
On 11/17/2016 01:09 AM, Pei He wrote:
Hi,
I am working on BEAM-59
<https://issues.apache.org/jira/browse/BEAM-59> "IOChannelFactory
redesign". The goals are:
1. Support file-based IOs (TextIO, AvorIO) with user-defined file system.
2. Support configuring any user-defined file system.
And, I drafted the design proposal in two parts to address them in order:
Part 1: IOChannelFactory Redesign
<https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#>
Summary:
Old API: WritableByteChannel create(String spec, String mimeType);
New API: WritableByteChannel create(URI uri, CreateOptions options);
Noticeable proposed changes:
1.
Includes the options parameter in most methods to specify behaviors.
2.
Replace String with URI to include scheme for files/directories
locations.
3.
Require file systems to provide a SeekableByteChannel for read.
4.
Additional methods, such as getMetadata(), rename() e.t.c
Part 2: Configurable BeamFileSystem
<https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs>
Summary:
Old API: IOChannelUtils.getFactory(glob).match(glob);
New API: BeamFileSystems.getFileSystem(glob, config).match(glob);
Looking for comments and feedback.
Thanks
--
Pei
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com