Hi all, thank you, Thomas, for starting this discussion and Pablo for sharing the ideas. FWIW adding here, we discussed this in terms of Core Beam Transform Load Tests that we are working on right now [1]. If generating synthetic data will be possible for portable streaming pipelines, we could use it in our work to test Python streaming scenarios.
[1] *https://s.apache.org/GVMa <https://s.apache.org/GVMa>* pt., 28 wrz 2018 o 08:18 Pablo Estrada <[email protected]> napisał(a): > Hi Thomas, all, > yes, this is quite important for testing, and in fact I'd think it's > important to streamline the insertion of native sources from different > runners to make the current runners more usable. But that's another topic. > > For generators of synthetic data, I had a couple ideas (and this will show > my limited knowledge about Flink and Streaming, but oh well): > > - Flink experts: Is it possible to add a pure-Beam generator that will do > something like: Impulse -> ParDo(generate multiple elements) -> Forced > "Write" to Flink (e.g. something like a reshuffle), and then have Flink > manage the parallelism for stages downstream from that? > > - If this is not possible, it may be worth writing some transform in Flink > / other runners that can be plugged in by inserting a custom URN. In fact, > it may be a good idea to streamline the insertion of native sources for > each runner based on some sort of CustomURNTransform() ? > > I hope I did not butcher those explanations too badly... > Best > -P. > > On Thu, Sep 27, 2018, 5:55 PM Thomas Weise <[email protected]> wrote: > >> There were a few discussions how we can facilitate testing for portable >> streaming pipelines with the Flink runner. The problem is that we currently >> don't have streaming sources in the Python SDK. >> >> One way to support testing could be a generator that extends the idea of >> Impulse to provide a Flink native trigger transform, optionally >> parameterized with an interval and max count. >> >> Test pipelines could then follow the generator with a Map function that >> creates whatever payloads are desirable. >> >> Thoughts? >> >> Thanks, >> Thomas >> >>
