+1

This sounds very good and there is a clear implementation path!

> On 24. Jun 2017, at 20:55, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> 
> Fair enough ;)
> 
> Let me review the different Jira and provide some feedback.
> 
> Regards
> JB
> 
> On Jun 24, 2017, 20:54, at 20:54, Eugene Kirpichov 
> <kirpic...@google.com.INVALID> wrote:
>> Hi JB,
>> I haven't yet thought about how this work can be parallelized. For now
>> I'd
>> like to just get feedback on the approach :)
>> But glad that you're willing to help out - let's discuss this too a bit
>> later!
>> 
>> On Sat, Jun 24, 2017 at 11:51 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>> 
>>> Thanks Eugene
>>> 
>>> I will pick up some.
>>> 
>>> Regards
>>> JB
>>> 
>>> On Jun 24, 2017, 20:00, at 20:00, Eugene Kirpichov
>>> <kirpic...@google.com.INVALID> wrote:
>>>> Filed JIRAs for the proposed features and linked with the doc:
>>>> https://issues.apache.org/jira/browse/BEAM-2511 TextIO should
>> support
>>>> reading a PCollection of filenames
>>>> https://issues.apache.org/jira/browse/BEAM-2512 TextIO should
>> support
>>>> watching for new files
>>>> https://issues.apache.org/jira/browse/BEAM-2513 TextIO should
>> support
>>>> watching files for new entries
>>>> 
>>>> On Fri, Jun 23, 2017 at 4:32 PM Eugene Kirpichov
>> <kirpic...@google.com>
>>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I've written up a proposal for incrementally delivering a bunch of
>>>> useful
>>>>> new features in TextIO based on Splittable DoFn. It's applicable
>> to
>>>> other
>>>>> file-based connectors, TextIO is just one good example. Let me
>> know
>>>> what
>>>>> you think!
>>>>> 
>>>>> https://s.apache.org/textio-sdf
>>>>> 
>>>>> Copy of abstract:
>>>>> 
>>>>> Users have often expressed interest in several new features for
>>>> reading
>>>>> files - in particular, incremental reading of log files (streaming
>> of
>>>> new
>>>>> files matching a pattern and new entries in each file) and reading
>> a
>>>>> PCollection of filenames (in particular, an unbounded collection
>>>> arriving
>>>>> from a stream such as PubSub or Kafka).
>>>>> 
>>>>> Splittable DoFn <http://s.apache.org/splittable-do-fn> (SDF)
>> enables
>>>>> these features. This document proposes an API for them, using the
>>>> example
>>>>> of TextIO, and proposes and a plan for delivering them subject to
>>>>> availability of SDF in different runners. Some availability
>>>> constraints are
>>>>> circumvented by Running Splittable DoFn via Source API
>>>>> <http://s.apache.org/sdf-via-source>.
>>>>> 
>>>>> TL;DR Read a collection of filepatterns arriving on PubSub via
>>>>> files.apply(TextIO.readEach()). Tail a filepattern via
>>>>> TextIO.read().watchForNewFiles().watchFilesForNewEntries(). Coming
>> to
>>>> a
>>>>> Beam SDK near you in small pieces.
>>>>> 
>>>>> I think I'm gonna start working on the first steps of the proposed
>>>> plan,
>>>>> in parallel with this discussion, because I'm excited :)
>>>>> 
>>> 

Reply via email to