Chamikara Jayalath created BEAM-2643:
----------------------------------------
Summary: Add TextIO.read_all() to Python SDK
Key: BEAM-2643
URL: https://issues.apache.org/jira/browse/BEAM-2643
Project: Beam
Issue Type: New Feature
Components: sdk-py
Reporter: Chamikara Jayalath
Java SDK now has TextIO.read_all() API that allows reading a massive number of
files by moving from using the BoundedSource API (which may perform expensive
source operations on the control plane) to using ParDo operations.
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L170
This API should be added for Python SDK as well.
This form of reading files does not support dynamic work rebalancing for now.
But this should not matter much when reading a massive number of relatively
small files. In the future this API can support dynamic work rebalancing
through Splittable DoFn.
cc: [~jkff]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)