Huge +1. This brings things more in line with Python's FileBasedSink where one simply overrides write[_encoded]_record and, usually, open/close. We may want to consider aligning the APIs. (And, of course bringing things like DynamicDestinations to Python.)
On Wed, Sep 6, 2017 at 9:24 PM, Jean-Baptiste Onofré <[email protected]> wrote: > Fantastic. > > Big +1 for this. > > Regards > JB > > > On 09/07/2017 03:44 AM, Eugene Kirpichov wrote: >> >> Hi, >> >> Please take a look at the following proposal. >> >> I believe, together with the (already available) FileIO.match() and >> FileIO.readMatches() this proposal will empower Beam users to address all >> use cases of file-based IO I'm aware of - which makes me quite excited. >> >> http://s.apache.org/fileio-write >> >> *We propose a new API for writing files in Beam: FileIO.write(). It is >> more >> modular and cleaner to code against than FileBasedSink, and aims to >> completely replace it.* >> >> *FileIO.write() lets an IO author implement only logic and configuration >> specific to a particular file format (e.g. Avro) and automatically get all >> format-agnostic features, such as sharding, cleanup, windowed writes, >> DynamicDestinations, compression, returning the successfully written >> filenames, etc.* >> >> TL;DR: >> >> FileIO.write(FileSink<DestT, InputT> { open(dest), write(input), close() >> }) >> .to(input → dest) >> .withFilenamePolicy(dest → prefix, shard pattern) >> .withEverythingElse() // like in WriteFiles >> > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com
