Eugene Kirpichov created BEAM-2641:
--------------------------------------

             Summary: Improve discoverability of TextIO.readAll() as a 
replacement of TextIO.read() for large globs
                 Key: BEAM-2641
                 URL: https://issues.apache.org/jira/browse/BEAM-2641
             Project: Beam
          Issue Type: Improvement
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Eugene Kirpichov


TextIO.readAll() dramatically outperforms TextIO.read() when reading very large 
numbers of files (hundreds of thousands or millions or more).

However, it is not obvious that this is what you should use if you have such a 
filepattern in TextIO.read().

We should take a variety of measures to make it more discoverable, e.g.:

* Add a parameter to TextIO.read(), like "withHintManyFiles()"
* Log something suggesting the use of that hint when splitting TextIO if the 
filepattern is very large
* Improve documentation
* Post something on StackOverflow about this



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to