PR out for review https://github.com/apache/beam/pull/3817

Next steps are clean it up (in this PR) and implement sinks for Text, XML
and TFRecord (in subsequent PRs).

On Thu, Sep 7, 2017 at 9:57 AM Robert Bradshaw <[email protected]>
wrote:

> Huge +1.
>
> This brings things more in line with Python's FileBasedSink where one
> simply overrides write[_encoded]_record and, usually, open/close. We
> may want to consider aligning the APIs. (And, of course bringing
> things like DynamicDestinations to Python.)
>
> On Wed, Sep 6, 2017 at 9:24 PM, Jean-Baptiste Onofré <[email protected]>
> wrote:
> > Fantastic.
> >
> > Big +1 for this.
> >
> > Regards
> > JB
> >
> >
> > On 09/07/2017 03:44 AM, Eugene Kirpichov wrote:
> >>
> >> Hi,
> >>
> >> Please take a look at the following proposal.
> >>
> >> I believe, together with the (already available) FileIO.match() and
> >> FileIO.readMatches() this proposal will empower Beam users to address
> all
> >> use cases of file-based IO I'm aware of - which makes me quite excited.
> >>
> >> http://s.apache.org/fileio-write
> >>
> >> *We propose a new API for writing files in Beam: FileIO.write(). It is
> >> more
> >> modular and cleaner to code against than FileBasedSink, and aims to
> >> completely replace it.*
> >>
> >> *FileIO.write() lets an IO author implement only logic and configuration
> >> specific to a particular file format (e.g. Avro) and automatically get
> all
> >> format-agnostic features, such as sharding, cleanup, windowed writes,
> >> DynamicDestinations, compression, returning the successfully written
> >> filenames, etc.*
> >>
> >> TL;DR:
> >>
> >> FileIO.write(FileSink<DestT, InputT> { open(dest), write(input), close()
> >> })
> >>        .to(input → dest)
> >>        .withFilenamePolicy(dest → prefix, shard pattern)
> >>        .withEverythingElse() // like in WriteFiles
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > [email protected]
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>

Reply via email to