GitHub user echauchot opened a pull request:
https://github.com/apache/beam/pull/3114
[BEAM-160] Port 'NexMark Queries' to Beam for use as integration test
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [X] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [X] Make sure tests pass via `mvn clean verify`.
- [X] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [X] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.pdf).
---
R: @dpmills @dhalperi
CC: @stasl @aviemzur @aljoscha because we discussed NexMark together :)
CC: @mshields822 I know you don not work on it anymore, but you might be
interested :)
CC: @ssisk for reflexion for IT tests
This is a port of the NexMark queries to Beam, to be used as integration
tests.
This can also be used as A-B testing (no-regression or performance
comparison between 2 versions of the same engine or of the same runner)
This a continuation of the previous PR
(https://github.com/apache/beam/pull/99) from Mark Shields.
The code has changed quite a bit: some queries have changed to use new Beam
APIs and there where some big refactors. More important, we can now run all the
queries in all the runners. Nevertheless, there are still some open issues in
Nexmark (https://github.com/iemejia/beam/issues) and in Beam upstream (see
issue links in https://issues.apache.org/jira/browse/BEAM-160)
Here is a doc that present NexMark components and pseudo code of the
queries to ease the review :
https://drive.google.com/open?id=1VgnGiVu8vSfm7Et-xAtQYv0PlEpqeyfmhpQUNPmWRJs
Everything needed to launch the queries is in the Readme. There is also a
support matrix towards the runners.
Please do not squash commits because there are several authors Mark,
Ismaël and I.
Good review :) !
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/iemejia/beam BEAM-160-nexmark
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3114.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3114
----
commit a7364a56acddf93a041535ac3fe4561d4d655c6d
Author: Mark Shields <[email protected]>
Date: 2016-03-28T23:25:29Z
NexMark
commit 6c013cb98b0011094889f8ae8e7e2646317cb813
Author: Mark Shields <[email protected]>
Date: 2016-06-03T00:32:49Z
Port unit tests, cleanup pom and add license to readme
commit 316b7e6684cfb78340484b736e473bb967d54361
Author: Ismaël MejÃa <[email protected]>
Date: 2016-11-30T17:43:02Z
Update Nexmark to the current Beam snapshot 0.7.0
Refactor from InProcessRunner to DirectRunner
Add Spark driver
Add Apex runner
Refine error logging per class and add log4j properties
Move README to top level and add section on configuration
Move project to the specific nexmark directory
Fix existing issues to pass verify -Prelease
Add running on the DirectRunner documentation
commit 2e47081f4c602c55b33ba783f014a8e1a8761acc
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-09T15:45:25Z
Add comments on queries improvements and fix compilation config
commit e2a84c293dc56916f1ae4808de9efc961eac22f2
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-15T08:52:36Z
Make NexmarkRunner generic and remove coupling with Google Dataflow
issue #28
commit 1ce5fe901995482e1c678daac5070eef204af52d
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-15T14:25:41Z
Activate monitoring on NexmarkSparkRunner
issue #28
commit 319f7fc55e548a3ec0689e05e4bc0e48fff7964b
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-15T16:15:58Z
Re-enable spark and flink in pom
issue #28
commit 34061011945521528562cea0fd9ba5841ff6508f
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-15T16:34:31Z
Activate monitoring on NexmarkFlinkRunner
issue #28
Fix compilation issue after rebase + make checkstyle happy again
commit b68fc71655783537749816e620f1bf697e8ed9a8
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-16T10:38:08Z
Fix QueryTest
Workaround for issue #22 + extra cleaning
commit 3d96de335e13f882d23d9519c48e24c4aabc4f25
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-16T14:57:18Z
Replace junit asserts by hamcrest asserts
Set numEvents in test to the minimum number that makes the tests pass
issue #15
comments, improve asserts (hamcrest), reformat
For now make generate monothreaded
commit 939ff4fe9f8ca7182dd631c36a7d4725e1d06750
Author: Ismaël MejÃa <[email protected]>
Date: 2017-03-21T17:29:20Z
Fix Apex driver and update execution matrix
commit b1f33655eebf49f988e8503d9a996b646880bd4c
Author: Ismaël MejÃa <[email protected]>
Date: 2017-03-23T18:32:45Z
Refactor classes into packages
The new hierarchy has logically based packages for:
- drivers
- io
- model
- queries
- sources
commit 962919daf7956a474b1ccb917f4134bd0c4fc1f7
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-24T13:29:08Z
Fix query5: Add comment on key lifting
issue #30
commit 1a8f83ca3f6558791b412efe1c930bd65d66fe44
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-24T14:54:12Z
Fix query10: Add comment for strange groupByKey
issue #31
commit 3f6d62b175960789e34ce57b5ebc73022c358679
Author: Etienne Chauchot <[email protected]>
Date: 2017-03-24T15:59:59Z
Fix query11: Replace Count.perKey by Count.perElement
issue #32
commit e8a6add06433c50d6bee6ca183f5b3dad64a817d
Author: Ismaël MejÃa <[email protected]>
Date: 2017-03-29T08:10:13Z
Fix compile after ParDo refactor
commit 39242faeec54b3405605e05e7bfd204a0bf6d731
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-03T13:18:04Z
Fix query3: Use GlobalWindow to comply with the State/Timer APIs
Issue #7
commit 38d42d3b017d40096a777ea35e66c07155dcb591
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-03T14:48:08Z
Fix query3: Use timer for personState expiration in GlobalWindow
Issue #29
commit 5571b2f655b62ce1f630247612bcc3463ddee112
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-03T14:50:51Z
Fix Runner categories in tests
commit 845b91a76693f17c3d10e934262956d4fdf1ba1e
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-03T16:26:17Z
Fix query12: Replace Count.perKey by Count.perElement
Issue #34
commit 4475649ab537fa8cfaaa08a6cdd1fe8d6943ec62
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-11T14:14:19Z
Add streaming unit tests
Issue #37
commit a6517859af59a2fc2b8ceb872a344b23803aa70f
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-12T09:13:58Z
Add trigger to global windowing in query3. Adding labels to query tests
commit a09ebb56f5022b588aaf77a674cf37c481a3b5ef
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-12T14:19:57Z
Update unit tests: results are no more linked to the number of events
issue #22
commit 4a768b1089655bcddf2771862cd84c8eba5c8682
Author: Ismaël MejÃa <[email protected]>
Date: 2017-04-13T08:47:54Z
Fix compile after PubsubIO refactor
commit 978ae6ba94c1224abf7b834b3fbae9d2b2e9b65a
Author: Ismaël MejÃa <[email protected]>
Date: 2017-04-13T09:07:50Z
Change Nexmark pom structure to mirror other modules on Beam
commit 6bcebb78b1b062ebeb36a0bb2ae9d77824d02c09
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-14T15:13:59Z
Fix Spark streaming termination via waitUntilFinish and timeout config
issue #39
commit d788d6abb50574c978db8fa044a156a9cfcfdfa6
Author: Ismaël MejÃa <[email protected]>
Date: 2017-04-19T09:22:42Z
Fix compile after sideOutput and split refactor
commit cac80dd461885d822c120481e7c33fcc38869356
Author: Ismaël MejÃa <[email protected]>
Date: 2017-04-21T10:21:55Z
Remove Accumulators and switch to the Metrics API
commit 9b874b886104f1db46f298c89c07dc32eaa88255
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-24T15:08:50Z
Update execution matrix
issue #45
commit 5fc854f972818848692faf960d60458e4a1a5727
Author: Etienne Chauchot <[email protected]>
Date: 2017-04-28T08:29:38Z
Fix compile after Coders and Pubsub refactor
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---