Hi Thomas, all,
yes, this is quite important for testing, and in fact I'd think it's
important to streamline the insertion of native sources from different
runners to make the current runners more usable. But that's another topic.

For generators of synthetic data, I had a couple ideas (and this will show
my limited knowledge about Flink and Streaming, but oh well):

- Flink experts: Is it possible to add a pure-Beam generator that will do
something like: Impulse -> ParDo(generate multiple elements) -> Forced
"Write" to Flink (e.g. something like a reshuffle), and then have Flink
manage the parallelism for stages downstream from that?

- If this is not possible, it may be worth writing some transform in Flink
/ other runners that can be plugged in by inserting a custom URN. In fact,
it may be a good idea to streamline the insertion of native sources for
each runner based on some sort of CustomURNTransform() ?

I hope I did not butcher those explanations too badly...
Best
-P.

On Thu, Sep 27, 2018, 5:55 PM Thomas Weise <[email protected]> wrote:

> There were a few discussions how we can facilitate testing for portable
> streaming pipelines with the Flink runner. The problem is that we currently
> don't have streaming sources in the Python SDK.
>
> One way to support testing could be a generator that extends the idea of
> Impulse to provide a Flink native trigger transform, optionally
> parameterized with an interval and max count.
>
> Test pipelines could then follow the generator with a Map function that
> creates whatever payloads are desirable.
>
> Thoughts?
>
> Thanks,
> Thomas
>
>

Reply via email to