Thanks all. The first PR is out for review:
https://github.com/apache/beam/pull/3443
Next work (watching for new files) is in progress, based on
https://github.com/apache/beam/pull/3360

On Tue, Jun 27, 2017 at 11:22 AM Kenneth Knowles <k...@google.com.invalid>
wrote:

> +1
>
> This is a really nice doc and plan.
>
> On Tue, Jun 27, 2017 at 1:49 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> > +1
> >
> > This sounds very good and there is a clear implementation path!
> >
> > > On 24. Jun 2017, at 20:55, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> > >
> > > Fair enough ;)
> > >
> > > Let me review the different Jira and provide some feedback.
> > >
> > > Regards
> > > JB
> > >
> > > On Jun 24, 2017, 20:54, at 20:54, Eugene Kirpichov
> > <kirpic...@google.com.INVALID> wrote:
> > >> Hi JB,
> > >> I haven't yet thought about how this work can be parallelized. For now
> > >> I'd
> > >> like to just get feedback on the approach :)
> > >> But glad that you're willing to help out - let's discuss this too a
> bit
> > >> later!
> > >>
> > >> On Sat, Jun 24, 2017 at 11:51 AM Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > >> wrote:
> > >>
> > >>> Thanks Eugene
> > >>>
> > >>> I will pick up some.
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>> On Jun 24, 2017, 20:00, at 20:00, Eugene Kirpichov
> > >>> <kirpic...@google.com.INVALID> wrote:
> > >>>> Filed JIRAs for the proposed features and linked with the doc:
> > >>>> https://issues.apache.org/jira/browse/BEAM-2511 TextIO should
> > >> support
> > >>>> reading a PCollection of filenames
> > >>>> https://issues.apache.org/jira/browse/BEAM-2512 TextIO should
> > >> support
> > >>>> watching for new files
> > >>>> https://issues.apache.org/jira/browse/BEAM-2513 TextIO should
> > >> support
> > >>>> watching files for new entries
> > >>>>
> > >>>> On Fri, Jun 23, 2017 at 4:32 PM Eugene Kirpichov
> > >> <kirpic...@google.com>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I've written up a proposal for incrementally delivering a bunch of
> > >>>> useful
> > >>>>> new features in TextIO based on Splittable DoFn. It's applicable
> > >> to
> > >>>> other
> > >>>>> file-based connectors, TextIO is just one good example. Let me
> > >> know
> > >>>> what
> > >>>>> you think!
> > >>>>>
> > >>>>> https://s.apache.org/textio-sdf
> > >>>>>
> > >>>>> Copy of abstract:
> > >>>>>
> > >>>>> Users have often expressed interest in several new features for
> > >>>> reading
> > >>>>> files - in particular, incremental reading of log files (streaming
> > >> of
> > >>>> new
> > >>>>> files matching a pattern and new entries in each file) and reading
> > >> a
> > >>>>> PCollection of filenames (in particular, an unbounded collection
> > >>>> arriving
> > >>>>> from a stream such as PubSub or Kafka).
> > >>>>>
> > >>>>> Splittable DoFn <http://s.apache.org/splittable-do-fn> (SDF)
> > >> enables
> > >>>>> these features. This document proposes an API for them, using the
> > >>>> example
> > >>>>> of TextIO, and proposes and a plan for delivering them subject to
> > >>>>> availability of SDF in different runners. Some availability
> > >>>> constraints are
> > >>>>> circumvented by Running Splittable DoFn via Source API
> > >>>>> <http://s.apache.org/sdf-via-source>.
> > >>>>>
> > >>>>> TL;DR Read a collection of filepatterns arriving on PubSub via
> > >>>>> files.apply(TextIO.readEach()). Tail a filepattern via
> > >>>>> TextIO.read().watchForNewFiles().watchFilesForNewEntries(). Coming
> > >> to
> > >>>> a
> > >>>>> Beam SDK near you in small pieces.
> > >>>>>
> > >>>>> I think I'm gonna start working on the first steps of the proposed
> > >>>> plan,
> > >>>>> in parallel with this discussion, because I'm excited :)
> > >>>>>
> > >>>
> >
> >
>

Reply via email to