[ https://issues.apache.org/jira/browse/BEAM-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993935#comment-15993935 ]
ASF GitHub Bot commented on BEAM-1925: -------------------------------------- GitHub user sb2nov opened a pull request: https://github.com/apache/beam/pull/2848 [BEAM-1925] validate DoFn at pipeline creation time Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-<Jira issue #>] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- R: @chamikaramj PTAL You can merge this pull request into a Git repository by running: $ git pull https://github.com/sb2nov/beam BEAM-1925-validate-dofn Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2848.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2848 ---- commit 26b09ba71b60c330d9605a8baa039900a50c4c9d Author: Sourabh Bajaj <sourabhba...@google.com> Date: 2017-05-02T22:47:54Z [BEAM-1925] validate DoFn at pipeline creation time ---- > Make DoFn invocation logic of Python SDK more extensible > -------------------------------------------------------- > > Key: BEAM-1925 > URL: https://issues.apache.org/jira/browse/BEAM-1925 > Project: Beam > Issue Type: Improvement > Components: sdk-py > Reporter: Chamikara Jayalath > Assignee: Chamikara Jayalath > > DoFn invocation logic of Python SDK is currently in DoFnRunner class. > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L54 > At initialization of this, we parse a DoFn and create local state. We use > this state when invoking DoFn methods process, start_bundle, and > finish_bundle. For example, we store a list of ArgPlaceholder objects within > the state of DoFnRunner to facilitate invocation of process method. > We will need to extend this functionality when adding new features to DoFn > class (for example to support Splittable DoFn [1]). So I think it's good to > refactor this code to be more extensible. > I think a good approach for this is to add DoFnInvoker and DoFnSignature > classes similar to Java SDK [2]. > In this approach: > A DoFnSignature captures the signature of a DoFn including methods and > arguments. > A DoFnInvoker implements a particular way DoFn methods will be executed > (initially we'll have simple and per-window invokers [3]). > A runner uses DoFnRunner to execute methods of a given DoFn. At > initialization, DoFnRunner crates a DoFnSignature and a DoFnInvoker for the > given DoFn. > DoFnSignature and DoFnInvoker methods will be used by SplittableDoFn > implementation as well. > [1] > https://docs.google.com/document/d/1h_zprJrOilivK2xfvl4L42vaX4DMYGfH1YDmi-s_ozM/edit#heading=h.e6patunrpiql > [2]https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java > [3] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L200 -- This message was sent by Atlassian JIRA (v6.3.15#6346)