Hi,

Please take a look at the following proposal.

I believe, together with the (already available) FileIO.match() and
FileIO.readMatches() this proposal will empower Beam users to address all
use cases of file-based IO I'm aware of - which makes me quite excited.

http://s.apache.org/fileio-write

*We propose a new API for writing files in Beam: FileIO.write(). It is more
modular and cleaner to code against than FileBasedSink, and aims to
completely replace it.*

*FileIO.write() lets an IO author implement only logic and configuration
specific to a particular file format (e.g. Avro) and automatically get all
format-agnostic features, such as sharding, cleanup, windowed writes,
DynamicDestinations, compression, returning the successfully written
filenames, etc.*

TL;DR:

FileIO.write(FileSink<DestT, InputT> { open(dest), write(input), close() })
      .to(input → dest)
      .withFilenamePolicy(dest → prefix, shard pattern)
      .withEverythingElse() // like in WriteFiles

Reply via email to