Dear Aviem, That's a good point. TextIO seems to make a few "string oriented" assumptions, see TextIO.Write's header and footer support <https://github.com/apache/beam/pull/918>, and IO design pattern: Decouple Parsers and Coders <https://issues.apache.org/jira/browse/BEAM-73> (BEAM-73) that came up during its development.
IMHO it would be nice to make TextIO purely about textual content, and perhaps go as far as remove the ability to pass in a coder. To support encoded files, one could consider something like FileIO which gets a coder and writes/reads the encoded/decoded content to/from a file. For example, AvroIO could be thought of as a FileIO with an AvroCoder. IO authors, does this sit well with what you had in mind? -Stas On Mon, Jan 30, 2017 at 10:24 AM Aviem Zur <aviem...@gmail.com> wrote: > Hi, > > While trying to use TextIO to write/read a binary file rather than String > lines from a textual file I ran into an issue - the delimiter TextIO uses > seems to be hardcoded '\n'. > See `findSeparatorBounds` - > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1024 > > The use case is to have a file of objects, encoded into bytes using a > coder. However, '\n' is not a good delimiter here, as you can imagine. > A similar pattern is found in Spark's `saveAsObjectFile` > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1512 > where > they use a more appropriate delimiter, to avoid such issues. > > I did not find any unit tests which use TextIO to read anything other than > Strings. >