[ https://issues.apache.org/jira/browse/BEAM-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christopher Hebert closed BEAM-2586. ------------------------------------ Resolution: Won't Fix Fix Version/s: Not applicable The approach I proposed isn't a good way to address my use case. > Accommodate custom delimiters in TextIO > --------------------------------------- > > Key: BEAM-2586 > URL: https://issues.apache.org/jira/browse/BEAM-2586 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core > Reporter: Christopher Hebert > Assignee: Davor Bonaci > Priority: Minor > Fix For: Not applicable > > > We frequently process text files delimited by something other than newlines, > including delimited only by end of file. > First option: > When we want to delimit by commas (or something else), we could use TextIO to > read in line by line and apply a transform to split each line on commas. When > we want to delimit by whole file, we could combine the elements of the > PCollection output from TextIO that come from the same file into one element. > Second option: > Alternatively to complicating (and slowing) our pipelines with the methods > above, we could write custom FileBasedSources for each use case. > Third option: > Preferably, we'd like to generalize TextIO to accept delimiters other than > the default: \n, \r, \r\n. > I'll attach a pull request for how we envision this generalization of TextIO > to look. > If this is not the direction Beam would like to go with TextIO, then we'll > stick to maintaining our own TextIO or our own FileBasedSources to achieve > this functionality. -- This message was sent by Atlassian JIRA (v6.4.14#64029)