[ https://issues.apache.org/jira/browse/BEAM-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286900#comment-16286900 ]
Kenneth Knowles commented on BEAM-3323: --------------------------------------- Doesn't TestStream + SDF yield plenty of events? > Create a generator of finite-but-unbounded PCollection's for integration > testing > -------------------------------------------------------------------------------- > > Key: BEAM-3323 > URL: https://issues.apache.org/jira/browse/BEAM-3323 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Kenneth Knowles > > Several IOs have features that exhibit nontrivial behavior when writing > unbounded PCollection's - e.g. WriteFiles with windowed writes; BigQueryIO. > We need to be able to write integration tests for these features. > Currently we have two ways to generate an unbounded PCollection without > reading from a real-world external streaming system such as pubsub or kafka: > 1) TestStream, which only works in direct runner - sufficient for some tests > but not all: definitely not sufficient for large-scale tests or for tests > that need to interact with a real instance of the external system (e.g. > BigQueryIO). It is also quite verbose to use. > 2) GenerateSequence.from(0) without a .to(), which returns an infinite amount > of data. > GenerateSequence.from(a).to(b) returns a finite amount of data, but returns > it as a bounded PCollection, and doesn't report the watermark. > I think the right thing to do here, for now, is to make > GenerateSequence.from(a).to(b) have an option (e.g. ".asUnbounded()", where > it will return an unbounded PCollection, go through UnboundedSource (or > potentially via SDF in runners that support it), and track the watermark > properly (or via a configurable watermark fn). -- This message was sent by Atlassian JIRA (v6.4.14#64029)