Chamikara Jayalath created BEAM-1925:
----------------------------------------
Summary: Make DoFn invocation logic of Python SDK more extensible
Key: BEAM-1925
URL: https://issues.apache.org/jira/browse/BEAM-1925
Project: Beam
Issue Type: Improvement
Components: sdk-py
Reporter: Chamikara Jayalath
Assignee: Chamikara Jayalath
DoFn invocation logic of Python SDK is currently in DoFnRunner class.
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L54
At initialization of this, we parse a DoFn and create local state. We use this
state when invoking DoFn methods process, start_bundle, and finish_bundle. For
example, we store a list of ArgPlaceholder objects within the state of
DoFnRunner to facilitate invocation of process method.
We will need to extend this functionality when adding new features to DoFn
class (for example to support Splittable DoFn [1]). So I think it's good to
refactor this code to be more extensible.
I think a good approach for this is to add DoFnInvoker and DoFnSignature
classes similar to Java SDK [2].
In this approach:
A DoFnSignature captures the signature of a DoFn including methods and
arguments.
A DoFnInvoker implements a particular way DoFn methods will be executed
(initially we'll have simple and per-window invokers [3]).
A runner uses DoFnRunner to execute methods of a given DoFn. At initialization,
DoFnRunner crates a DoFnSignature and a DoFnInvoker for the given DoFn.
DoFnSignature and DoFnInvoker methods will be used by SplittableDoFn
implementation as well.
[1]
https://docs.google.com/document/d/1h_zprJrOilivK2xfvl4L42vaX4DMYGfH1YDmi-s_ozM/edit#heading=h.e6patunrpiql
[2]https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java
[3]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L200
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)