GitHub user tgroh opened a pull request:
https://github.com/apache/incubator-beam/pull/406
[BEAM-155] Use custom Assertions in Spark Streaming Tests
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
Spark Streaming Side Inputs behave differently than the Beam Model. As
the underlying implementation of PAssert is based on side inputs, this
means that Streaming Spark Tests that use SideInputs as the actuals are
non-portable.
More specifically, this enables pipeline-construction time enforcement
that a Preexisting Side Input must be in a window compatible with the
Global Window (otherwise the side input WindowFn should throw an
exception, e.g. in
[PartitioningWindowFn](https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java#L46)
Modify FlattenStreamingTest, KafkaStreamingTest, and
SimpleStreamingWordCountTest to group all of the contents of the
asserted PCollection into a single key, and assert the contents of that
concatenation, rather than doing so via PAssert and Side Input.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tgroh/incubator-beam
spark_custom_streaming_assertions
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/406.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #406
----
commit ccf3412b06862dddaabbd0869aa0aefca3c77156
Author: Thomas Groh <[email protected]>
Date: 2016-05-31T20:27:43Z
Use custom Assertions in Spark Streaming Tests
Spark Streaming Side Inputs behave differently than the Beam Model. As
the underlying implementation of PAssert is based on side inputs, this
means that Streaming Spark Tests that use SideInputs as the actuals are
non-portable.
Modify FlattenStreamingTest, KafkaStreamingTest, and
SimpleStreamingWordCountTest to group all of the contents of the
asserted PCollection into a single key, and assert the contents of that
concatenation, rather than doing so via PAssert and Side Input.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---