GitHub user kennknowles opened a pull request:
https://github.com/apache/incubator-beam/pull/756
Replace ParDo with MapElements and FlatMapElements where possible
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
The commits ended up having fairly separate topics, but can be reviewed
individually or as a medium-sized change.
1. The first commit replaces `ParDo` with `MapElements` and
`FlatMapElements` where it is easy to do so.
2. While debugging, I noticed that `DoFn` used a less-powerful form of
`TypeDescriptor` and switched trivially to the enhanced version.
3. The root cause of issues with `MapElements` and `FlatMapElements` was a
lack of use of the input type descriptor. Making it available involved a
moderate refactor. In the process I broke some tests to do with display data
and fixed them plus enhancements to display data for `SimpleFunction`.
If reviewers insist, I can try to alter this commit history.
R: @bjchambers AND @swegner
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kennknowles/incubator-beam map-flatmap
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/756.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #756
----
commit b041197382f6a4ea5f6ad93f5e6f32aa1212937f
Author: Kenneth Knowles <[email protected]>
Date: 2016-07-27T21:23:15Z
Replace ParDo with simpler transforms where possible
There are a number of places in the Java SDK where we use
ParDo.of(DoFn) when MapElements or other higher-level
composites are applicable and readable. This change
alters a number of those.
commit 2b28a87cd9b39e145e6bfcd0b04ed63221dad271
Author: Kenneth Knowles <[email protected]>
Date: 2016-07-29T01:44:39Z
Make DoFn use instance-based TypeDescriptor
commit 5a95226719831e19f86703ac9838bbb5ec2c2362
Author: Kenneth Knowles <[email protected]>
Date: 2016-07-29T01:47:04Z
Use input type in coder inference for MapElements and FlatMapElements
Previously, the input TypeDescriptor was unknown, so we would fail
to infer a coder for things like MapElements.of(SimpleFunction<T, T>)
even if the input PCollection provided a coder for T.
Now, the input type is plumbed appropriately and the coder is inferred.
This required internal changes to explicitly support good display data.
While doing this, I just added display data to SimpleFunction by analogy
with DoFn.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---