[
https://issues.apache.org/jira/browse/BEAM-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001495#comment-16001495
]
ASF GitHub Bot commented on BEAM-1925:
--------------------------------------
GitHub user sb2nov opened a pull request:
https://github.com/apache/beam/pull/2963
[BEAM-1925] validate DoFn at pipeline creation time
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`.
- [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.pdf).
---
R: @robertwb PTAL
Same changes as https://github.com/apache/beam/pull/2848 without the new
file creation.
The performance seems identical to master on the benchmark gist.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sb2nov/beam BEAM-1925-validate-dofn-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/2963.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2963
----
commit 6ee4c243b7f1a942f66f4f6f5fb19b32fb334a7c
Author: Sourabh Bajaj <[email protected]>
Date: 2017-05-08T20:33:15Z
[BEAM-1925] validate DoFn at pipeline creation time
----
> Make DoFn invocation logic of Python SDK more extensible
> --------------------------------------------------------
>
> Key: BEAM-1925
> URL: https://issues.apache.org/jira/browse/BEAM-1925
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py
> Reporter: Chamikara Jayalath
> Assignee: Chamikara Jayalath
>
> DoFn invocation logic of Python SDK is currently in DoFnRunner class.
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L54
> At initialization of this, we parse a DoFn and create local state. We use
> this state when invoking DoFn methods process, start_bundle, and
> finish_bundle. For example, we store a list of ArgPlaceholder objects within
> the state of DoFnRunner to facilitate invocation of process method.
> We will need to extend this functionality when adding new features to DoFn
> class (for example to support Splittable DoFn [1]). So I think it's good to
> refactor this code to be more extensible.
> I think a good approach for this is to add DoFnInvoker and DoFnSignature
> classes similar to Java SDK [2].
> In this approach:
> A DoFnSignature captures the signature of a DoFn including methods and
> arguments.
> A DoFnInvoker implements a particular way DoFn methods will be executed
> (initially we'll have simple and per-window invokers [3]).
> A runner uses DoFnRunner to execute methods of a given DoFn. At
> initialization, DoFnRunner crates a DoFnSignature and a DoFnInvoker for the
> given DoFn.
> DoFnSignature and DoFnInvoker methods will be used by SplittableDoFn
> implementation as well.
> [1]
> https://docs.google.com/document/d/1h_zprJrOilivK2xfvl4L42vaX4DMYGfH1YDmi-s_ozM/edit#heading=h.e6patunrpiql
> [2]https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java
> [3]
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/common.py#L200
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)