Chamikara Jayalath created BEAM-2643:
----------------------------------------

             Summary: Add TextIO.read_all() to Python SDK
                 Key: BEAM-2643
                 URL: https://issues.apache.org/jira/browse/BEAM-2643
             Project: Beam
          Issue Type: New Feature
          Components: sdk-py
            Reporter: Chamikara Jayalath


Java SDK now has TextIO.read_all() API that allows reading a massive number of 
files by moving from using the BoundedSource API (which may perform expensive 
source operations on the control plane) to using ParDo operations.

https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L170

This API should be added for Python SDK as well.

This form of reading files does not support dynamic work rebalancing for now. 
But this should not matter much when reading a massive number of relatively 
small files. In the future this API can support dynamic work rebalancing 
through Splittable DoFn.

cc: [~jkff]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to