GitHub user mxm opened a pull request:
https://github.com/apache/incubator-beam/pull/104
[BEAM-158] add support for bounded sources in streaming
Apart from a few improvements, this PR introduces bounded sources in
streaming. The BoundedSource wrapper (`SourceInputFormat`) is the same as for
the batch part of the runner. The translator assigns ingestion time watermarks
and processing time timestamps upon reading from the source. We could make this
more flexible in terms of watermark generation if we had an UnboundedSource
wrapper for a BoundedSource.
Perhaps we could have common utility for runners to deal with serialization
of PipelineOptions. At some point, they have to be shipped. I had to change the
serialization code because I was experiencing a serialization bug which led to
a serialization loop. Debugging this was almost impossible because the stack
trace doesn't show all serialization calls due to some magic in the VM. I
didn't find any cyclic references between the PipelineOptions and Flink
components. I'm assuming this is a bug and the workaround using byte array
serialization of the options is fair enough. See `SourceInputFormat`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mxm/incubator-beam BEAM-158
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/104.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #104
----
commit f26674e7a7b30ee3f992edfc8e473df2a7ee3e80
Author: Maximilian Michels <[email protected]>
Date: 2016-03-30T14:43:04Z
[flink] improve InputFormat wrapper and ReadSourceITCase
commit 03404c7f4a5656bb5c5c0a2510f12e33292fef01
Author: Maximilian Michels <[email protected]>
Date: 2016-03-30T17:05:27Z
[flink] improvements to UnboundedSource translation
commit de574136a5b4ba9b75231b321d0190e23af3bac2
Author: Maximilian Michels <[email protected]>
Date: 2016-03-31T08:18:01Z
[BEAM-158] add support for bounded sources in streaming
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---