Chamikara Jayalath created BEAM-360:
---------------------------------------
Summary: Add a framework for creating Python-SDK sources for new
file types
Key: BEAM-360
URL: https://issues.apache.org/jira/browse/BEAM-360
Project: Beam
Issue Type: New Feature
Components: sdk-py
Reporter: Chamikara Jayalath
Assignee: Chamikara Jayalath
We already have a framework for creating new sources for Beam Python SDK -
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
It would be great if we can add a framework on top of this that encapsulates
logic common to sources that are based on files. This framework can include
following features that are common to sources based on files.
(1) glob expansion
(2) support for new file-systems
(3) dynamic work rebalancing based on byte offsets
(4) support for reading compressed files.
Java SDK has a similar framework and it's available at -
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)