Eugene Kirpichov created BEAM-2641:
--------------------------------------
Summary: Improve discoverability of TextIO.readAll() as a
replacement of TextIO.read() for large globs
Key: BEAM-2641
URL: https://issues.apache.org/jira/browse/BEAM-2641
Project: Beam
Issue Type: Improvement
Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Eugene Kirpichov
TextIO.readAll() dramatically outperforms TextIO.read() when reading very large
numbers of files (hundreds of thousands or millions or more).
However, it is not obvious that this is what you should use if you have such a
filepattern in TextIO.read().
We should take a variety of measures to make it more discoverable, e.g.:
* Add a parameter to TextIO.read(), like "withHintManyFiles()"
* Log something suggesting the use of that hint when splitting TextIO if the
filepattern is very large
* Improve documentation
* Post something on StackOverflow about this
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)