GitHub user aljoscha opened a pull request:
https://github.com/apache/incubator-beam/pull/335
[BEAM-270] Support Timestamps/Windows in Flink Batch
This is a cleanup version of #328
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aljoscha/incubator-beam
flink-windowed-value-batch-cleaned
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/335.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #335
----
commit e6d48c0cd7eac93f14f7c4df3eee8a8936b9cc51
Author: Pei He <[email protected]>
Date: 2016-04-27T02:55:34Z
Replace dataflow stagingLocation with tempLocation in example module.
commit 0e9041ae0f53b08879e569c1c0501ce5c4651720
Author: Thomas Groh <[email protected]>
Date: 2016-04-28T20:42:36Z
Use CommittedResult in InMemoryWatermarkManager
This enable unprocessed elements to be handled in the Watermark manager
after they are added to the CommittedResult structure.
commit cb71b563b92e4ed14fb84bb07056ad9a69986691
Author: Jean-Baptiste Onofré <[email protected]>
Date: 2016-04-29T13:26:01Z
[BEAM-154] Use dependencyManagement and pluginManagement to keep all
modules sync in term of version
commit 86a68c2b3137b63a919d703d228b32ea4e1bbf4b
Author: Jason Kuster <[email protected]>
Date: 2016-04-30T00:25:23Z
Add Matcher serializer in TestPipeline.
commit 21d355ab471e961f5be5bdf3de2771cea18cde73
Author: Dan Halperin <[email protected]>
Date: 2016-05-02T20:25:40Z
beam-wide: blacklist Throwables.propagate and remove uses
This is a forward-port of
https://github.com/GoogleCloudPlatform/DataflowJavaSDK/pull/232
commit 003a300eb352a3ba09980711814d0ba125e95e69
Author: Pei He <[email protected]>
Date: 2016-05-02T22:26:39Z
Fix the java doc for Combine.perKey and ApproximateQuantiles
commit 8797b140d7566fcb4e26afeb1ddb4206c5462550
Author: Pei He <[email protected]>
Date: 2016-05-03T18:47:56Z
[BEAM-48] Upgrade bigquery library to v2-rev292-1.21.0
commit 376f88e9278650647f3c83c3265b3af6ed81acb4
Author: Dan Halperin <[email protected]>
Date: 2016-05-03T19:04:13Z
[BEAM-168] IntervalBEB: remove deprecated function
The pre-commit wordcount test will confirm that this does not break the
Cloud Dataflow worker.
commit b58741731176f2386f1b4e31fae72092209a3871
Author: Dan Halperin <[email protected]>
Date: 2016-05-03T20:07:04Z
[BEAM-255] Write: add limited logging
This will help, for all sinks, users and developers gain insight into where
time
is spent. (Enabling DEBUG level will provide more insight.)
commit 677c41232795116c8595994c3b9f59e1f50e0fb6
Author: Mark Shields <[email protected]>
Date: 2016-04-27T01:41:37Z
[BEAM-53] Add PubsubApiaryClient, PubsubTestClient
* Move PubsubClient and friends out of sdk.io and into sdk.util.
* Add PubsubApiaryClient since gRPC has onerous boot class path
requirements which I don't wish to inflict upon other runners.
* Add PubsubTestClient in preparation for unit testing
PubsubUnbounded{Source,Sink}.
* Unit tests for all of above.
commit dc0f6bb992cc5c119ae5098241e0ad237747cee5
Author: Thomas Groh <[email protected]>
Date: 2016-05-02T17:03:43Z
Move ReadyCheckingSideInputReader to util
This SideInputReader allows callers to check for a side input being
available before attempting to read the contents
commit 2366fa5616bb09ba3e7b1dc312baa92440154f86
Author: Thomas Groh <[email protected]>
Date: 2016-05-02T17:04:20Z
Add PushbackSideInputDoFnRunner
This DoFnRunner wraps a DoFnRunner and provides an additional method to
process an element in all the windows where all side inputs are ready,
returning any elements that it could not process.
commit eba8a49320f2ad9756e924fe36b6b4db4071bf8a
Author: Kenneth Knowles <[email protected]>
Date: 2016-05-02T20:11:12Z
Add TestFlinkPipelineRunner to FlinkRunnerRegistrar
This makes the runner available for selection by integration tests.
commit 16a2c08a62b38b6b735e751ea31bbe60f0effba6
Author: Kenneth Knowles <[email protected]>
Date: 2016-05-02T21:04:20Z
Configure RunnableOnService tests for Flink in batch mode
Today Flink batch supports only global windows. This is a situation we
intend our build to allow, eventually via JUnit category filtering.
For now all the test classes that use non-global windows are excluded
entirely via maven configuration. In the future, it should be on a
per-test-method basis.
commit 50012a4edce44bc57e6a9a22bb61f0f5dfe37a70
Author: Thomas Groh <[email protected]>
Date: 2016-05-03T20:22:13Z
Refactor CompletionCallbacks
The default and timerful completion callbacks are identical, excepting
their calls to evaluationContext.commitResult; factor that code into a
common location.
commit 7ecef4e688e6877c256da7ceb6bd5347f2409536
Author: Kenneth Knowles <[email protected]>
Date: 2016-05-03T20:22:59Z
Create runners/core module for artifact org.apache.beam:runners-core
This is strictly creating the module and moving one easy class to it.
Many of the utilities in org.apache.beam.util and subpackages should
move as developments allow.
commit 8529c69a1ce9ae91363d3a8e93659eba019c0379
Author: Pei He <[email protected]>
Date: 2016-05-04T00:55:11Z
[BEAM-48] Refactor BigQueryServices to support extract and query jobs
commit 9ad04abc7cc49b415a1aa33a8baf7d058561f722
Author: Luke Cwik <[email protected]>
Date: 2016-05-04T03:02:28Z
[BEAM-256] Address wrong import order and add millis to output path for
WordCountIT
commit 658e6099dae849679c70d76dc85ba0a0ef6c2e96
Author: Kenneth Knowles <[email protected]>
Date: 2016-05-04T03:09:16Z
Fix direct runner pom & deps
commit 7a8b1cc1f8877ea7d4b9a8ec9b3b765804424a58
Author: Kenneth Knowles <[email protected]>
Date: 2016-05-04T04:11:39Z
Speed up non-release builds
commit c2314189c78d19d5f93a0a6cd86896290833c54e
Author: Kenneth Knowles <[email protected]>
Date: 2016-05-06T17:54:41Z
Remove unused threadCount from integration tests
commit 4d9e4005bcae229f5ce023dbb1524e1f899cce4d
Author: Kenneth Knowles <[email protected]>
Date: 2016-05-06T17:55:16Z
Disable Flink streaming integration tests for now
commit e58c2159574a89535d6d27e26bb24339d3eac1ef
Author: Kenneth Knowles <[email protected]>
Date: 2016-05-06T19:49:55Z
Special casing job exec AssertionError in TestFlinkPipelineRunner
commit 97a388ac54a454279552551429582d9d174e9d7e
Author: Aljoscha Krettek <[email protected]>
Date: 2016-05-06T07:38:55Z
Add hamcrest dependency to Flink Runner
Without it the RunnableOnService tests seem to not work
commit de71ecf3957be94aa6b9faf07ba423bd94c3803c
Author: Aljoscha Krettek <[email protected]>
Date: 2016-05-06T06:26:50Z
Fix Dangling Flink DataSets
commit d16522b0d71e7d81f9f58be1ab57e5c17b2db0a7
Author: Aljoscha Krettek <[email protected]>
Date: 2016-05-13T12:17:50Z
Fix faulty Flink Flatten when PCollectionList is empty
commit 7972dcdcaf084505823d8aa283c9cfe867bdc219
Author: Aljoscha Krettek <[email protected]>
Date: 2016-05-10T11:53:03Z
[BEAM-270] Support Timestamps/Windows in Flink Batch
With this change we always use WindowedValue<T> for the underlying Flink
DataSets instead of just T. This allows us to support windowing as well.
This changes also a lot of other stuff enabled by the above:
- Use WindowedValue throughout
- Add proper translation for Window.into()
- Make side inputs window aware
- Make GroupByKey and Combine transformations window aware, this
includes support for merging windows. GroupByKey is implemented as a
Combine with a concatenating CombineFn, for simplicity
This removes Flink specific transformations for things that are handled
by builtin sources/sinks, among other things this:
- Removes special translation for AvroIO.Read/Write and
TextIO.Read/Write
- Removes special support for Write.Bound, this was not working properly
and is now handled by the Beam machinery that uses DoFns for this
- Removes special translation for binary Co-Group, the code was still
in there but was never used
With this change all RunnableOnService tests run on Flink Batch.
commit 072343d6e9bc0105177cea908bd34740aa6c3e21
Author: Aljoscha Krettek <[email protected]>
Date: 2016-05-13T12:41:20Z
Remove superfluous Flink Tests, Fix those that stay in
All of the stuff in the removed ITCases is covered (in more detail) by
the RunnableOnService tests.
commit 4292f04ebe30fada9158ec736a58a2ea56feb8ef
Author: Aljoscha Krettek <[email protected]>
Date: 2016-05-14T09:48:47Z
Fix last last outstanding test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---