Fair enough ;)

Let me review the different Jira and provide some feedback.

Regards
JB

On Jun 24, 2017, 20:54, at 20:54, Eugene Kirpichov 
<kirpic...@google.com.INVALID> wrote:
>Hi JB,
>I haven't yet thought about how this work can be parallelized. For now
>I'd
>like to just get feedback on the approach :)
>But glad that you're willing to help out - let's discuss this too a bit
>later!
>
>On Sat, Jun 24, 2017 at 11:51 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>wrote:
>
>> Thanks Eugene
>>
>> I will pick up some.
>>
>> Regards
>> JB
>>
>> On Jun 24, 2017, 20:00, at 20:00, Eugene Kirpichov
>> <kirpic...@google.com.INVALID> wrote:
>> >Filed JIRAs for the proposed features and linked with the doc:
>> >https://issues.apache.org/jira/browse/BEAM-2511 TextIO should
>support
>> >reading a PCollection of filenames
>> >https://issues.apache.org/jira/browse/BEAM-2512 TextIO should
>support
>> >watching for new files
>> >https://issues.apache.org/jira/browse/BEAM-2513 TextIO should
>support
>> >watching files for new entries
>> >
>> >On Fri, Jun 23, 2017 at 4:32 PM Eugene Kirpichov
><kirpic...@google.com>
>> >wrote:
>> >
>> >> Hi all,
>> >>
>> >> I've written up a proposal for incrementally delivering a bunch of
>> >useful
>> >> new features in TextIO based on Splittable DoFn. It's applicable
>to
>> >other
>> >> file-based connectors, TextIO is just one good example. Let me
>know
>> >what
>> >> you think!
>> >>
>> >> https://s.apache.org/textio-sdf
>> >>
>> >> Copy of abstract:
>> >>
>> >> Users have often expressed interest in several new features for
>> >reading
>> >> files - in particular, incremental reading of log files (streaming
>of
>> >new
>> >> files matching a pattern and new entries in each file) and reading
>a
>> >> PCollection of filenames (in particular, an unbounded collection
>> >arriving
>> >> from a stream such as PubSub or Kafka).
>> >>
>> >> Splittable DoFn <http://s.apache.org/splittable-do-fn> (SDF)
>enables
>> >> these features. This document proposes an API for them, using the
>> >example
>> >> of TextIO, and proposes and a plan for delivering them subject to
>> >> availability of SDF in different runners. Some availability
>> >constraints are
>> >> circumvented by Running Splittable DoFn via Source API
>> >> <http://s.apache.org/sdf-via-source>.
>> >>
>> >> TL;DR Read a collection of filepatterns arriving on PubSub via
>> >> files.apply(TextIO.readEach()). Tail a filepattern via
>> >> TextIO.read().watchForNewFiles().watchFilesForNewEntries(). Coming
>to
>> >a
>> >> Beam SDK near you in small pieces.
>> >>
>> >> I think I'm gonna start working on the first steps of the proposed
>> >plan,
>> >> in parallel with this discussion, because I'm excited :)
>> >>
>>

Reply via email to