[
https://issues.apache.org/jira/browse/BEAM-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739453#comment-16739453
]
Lukasz Gajowy commented on BEAM-3323:
-------------------------------------
Currently, we have both UnboundedSyntheticSource that generates (in a
configurable way) a PCollection<KV<byte[], byte[]> of deterministic data ([link
to source|https://github.com/apache/beam/tree/master/sdks/java/io/synthetic]).
Maybe it could be used for the above-mentioned purposes?
CC: [~kenn], [~iemejia]
> Create a generator of finite-but-unbounded PCollection's for integration
> testing
> --------------------------------------------------------------------------------
>
> Key: BEAM-3323
> URL: https://issues.apache.org/jira/browse/BEAM-3323
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Priority: Major
>
> Several IOs have features that exhibit nontrivial behavior when writing
> unbounded PCollection's - e.g. WriteFiles with windowed writes; BigQueryIO.
> We need to be able to write integration tests for these features.
> Currently we have two ways to generate an unbounded PCollection without
> reading from a real-world external streaming system such as pubsub or kafka:
> 1) TestStream, which only works in direct runner - sufficient for some tests
> but not all: definitely not sufficient for large-scale tests or for tests
> that need to interact with a real instance of the external system (e.g.
> BigQueryIO). It is also quite verbose to use.
> 2) GenerateSequence.from(0) without a .to(), which returns an infinite amount
> of data.
> GenerateSequence.from(a).to(b) returns a finite amount of data, but returns
> it as a bounded PCollection, and doesn't report the watermark.
> I think the right thing to do here, for now, is to make
> GenerateSequence.from(a).to(b) have an option (e.g. ".asUnbounded()", where
> it will return an unbounded PCollection, go through UnboundedSource (or
> potentially via SDF in runners that support it), and track the watermark
> properly (or via a configurable watermark fn).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)