I really like this document. It is easy to read and informative. Three
things not addressed by the document:

1. Major Beam use cases. I'm sure we have a few in the SDK that could be
outlined in terms of the new API with pseudocode.
2. Related work. How does this differ from other filesystem APIs and why?
3. Discussion of non-Java languages. It would be good to know what classes
in e.g. Python we might use in place of URI, SeekableByteChannel, etc.

On Mon, Dec 5, 2016 at 4:41 PM, Pei He <[email protected]> wrote:

> I have received a lot of comments in "Part 1: IOChannelFactory
> Redesign" [1]. And, I have updated the design based on the feedback.
>
> Now, I feel it is close to be ready for implementation, and I would like to
> summarize the changes:
> 1. Replaced FilePath with URI for resolving files paths.
> 2. Required match(String spec) to handle ambiguities in users provided
> strings (see the match() java doc in the design doc for details).
> 3. Changed Metadata to use Future.get() paradigm, and removed exception().
> 4. Changed methods on FileSystem interface to be protected (visible for
> implementors), and created FileSystems utility (visible for callers).
> 5.  Simplified FileSystem interface by moving operation options, such as
> DeleteOptions, MatchOptions, to the FileSystems utility.
> 6. Simplified FileSystem interface by requiring certain behaviors, such as
> creating recursively, throwing for missing files.
>
> Any thoughts / feedback?
> --
> Pei
>
> [1]
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-
> XJsVG3qel2lhdKTknmZ_7M/edit#
>
> On Wed, Nov 30, 2016 at 1:32 PM, Pei He <[email protected]> wrote:
>
> > Thanks JB for the feedback.
> >
> > Yes, we should provide a hadoop.fs.FileSystem adaptor. As you said, it
> > will make a range of file system available in Beam.
> >
> > And, people can choose to implement BeamFileSystem directly to get the
> > best performance (For example, providing bulk operations.)
> >
> > --
> > Pei
> >
> >
> >
> > On Tue, Nov 29, 2016 at 11:11 AM, Jean-Baptiste Onofré <[email protected]>
> > wrote:
> >
> >> Hi Pei,
> >>
> >> rethinking about that, I understand that the purpose of the Beam
> >> filesystem is to avoid to bring a bunch of dependencies into the core.
> That
> >> makes perfect sense.
> >>
> >> So, I agree that a Beam filesystem abstract is fine.
> >>
> >> My point is that we should provide a HadoopFilesystem extension/plugin
> >> for Beam filesystem asap: that would help us to support a good range of
> >> filesystems quickly.
> >>
> >> Just my $0.01 ;)
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 11/17/2016 08:18 PM, Pei He wrote:
> >>
> >>> Hi JB,
> >>> My proposals are based on the current IOChannelFactory, and how they
> are
> >>> used in FileBasedSink.
> >>>
> >>> Let's me spend more time to investigate Hadoop FileSystem interface.
> >>> --
> >>> Pei
> >>>
> >>> On Thu, Nov 17, 2016 at 1:21 AM, Jean-Baptiste Onofré <[email protected]
> >
> >>> wrote:
> >>>
> >>> By the way, Pei, for the record: why introducing BeamFileSystem and not
> >>>> using the Hadoop FileSystem interface ?
> >>>>
> >>>> Thanks
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 11/17/2016 01:09 AM, Pei He wrote:
> >>>>
> >>>> Hi,
> >>>>>
> >>>>> I am working on BEAM-59
> >>>>> <https://issues.apache.org/jira/browse/BEAM-59> "IOChannelFactory
> >>>>> redesign". The goals are:
> >>>>>
> >>>>> 1. Support file-based IOs (TextIO, AvorIO) with user-defined file
> >>>>> system.
> >>>>>
> >>>>> 2. Support configuring any user-defined file system.
> >>>>>
> >>>>> And, I drafted the design proposal in two parts to address them in
> >>>>> order:
> >>>>>
> >>>>> Part 1: IOChannelFactory Redesign
> >>>>> <https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJ
> >>>>> sVG3qel2lhdKTknmZ_7M/edit#>
> >>>>>
> >>>>> Summary:
> >>>>>
> >>>>> Old API: WritableByteChannel create(String spec, String mimeType);
> >>>>>
> >>>>> New API: WritableByteChannel create(URI uri, CreateOptions options);
> >>>>>
> >>>>> Noticeable proposed changes:
> >>>>>
> >>>>>
> >>>>>    1.
> >>>>>
> >>>>>    Includes the options parameter in most methods to specify
> behaviors.
> >>>>>    2.
> >>>>>
> >>>>>    Replace String with URI to include scheme for files/directories
> >>>>>    locations.
> >>>>>    3.
> >>>>>
> >>>>>    Require file systems to provide a SeekableByteChannel for read.
> >>>>>    4.
> >>>>>
> >>>>>    Additional methods, such as getMetadata(), rename() e.t.c
> >>>>>
> >>>>>
> >>>>> Part 2: Configurable BeamFileSystem
> >>>>> <https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4
> >>>>> q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs>
> >>>>>
> >>>>> Summary:
> >>>>>
> >>>>> Old API: IOChannelUtils.getFactory(glob).match(glob);
> >>>>>
> >>>>> New API: BeamFileSystems.getFileSystem(glob, config).match(glob);
> >>>>>
> >>>>>
> >>>>> Looking for comments and feedback.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Pei
> >>>>>
> >>>>>
> >>>>> --
> >>>> Jean-Baptiste Onofré
> >>>> [email protected]
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>>
> >>>>
> >>>
> >> --
> >> Jean-Baptiste Onofré
> >> [email protected]
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
> >
>

Reply via email to