piter75 commented on issue #26329: URL: https://github.com/apache/beam/issues/26329#issuecomment-1517629623
Thanks for responding @kkdoon. Our use case for `BoundedSource` in streaming job stems from the fact that we want to have a hot start whenever we start/re-start the job. Part of our pipeline depends on the join between streams with different event frequency. On one side there is a stream that delivers thousands of messages per second and on the other is one that may not have matching messages for a day. We solved this frequency mismatch by loading the history of the "slow moving" stream from BigQuery and then making a union with a stream of messages that come straight from the PubSub topic. This way we have a union that is both complete from the start and still unbounded during the pipeline run. Unit testing this issue may be difficult because it is triggered when used on a specific runner in specific run mode and I wouldn't like to test the behaviour of the BigQuery source with the runner tests. I will try to come up with some reproduction example. What runner are you using by the way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
