Thanks for sharing the link, this looks very promising!

For the synthetic source, if we need a runner native trigger mechanism,
then it should probably just emit an empty byte array like the impulse
implementation does, and everything else could be left to SDK specific
transforms that are downstream. We don't have support for timers in the
portable Flink runner yet. With timers, there would not be the need for a
runner native URN and it could work just like Pablo described.


On Fri, Sep 28, 2018 at 3:09 AM Łukasz Gajowy <[email protected]>
wrote:

> Hi all,
>
> thank you, Thomas, for starting this discussion and Pablo for sharing the
> ideas. FWIW adding here, we discussed this in terms of Core Beam Transform
> Load Tests that we are working on right now [1]. If generating synthetic
> data will be possible for portable streaming pipelines, we could use it in
> our work to test Python streaming scenarios.
>
> [1] *https://s.apache.org/GVMa <https://s.apache.org/GVMa>*
>
> pt., 28 wrz 2018 o 08:18 Pablo Estrada <[email protected]> napisał(a):
>
>> Hi Thomas, all,
>> yes, this is quite important for testing, and in fact I'd think it's
>> important to streamline the insertion of native sources from different
>> runners to make the current runners more usable. But that's another topic.
>>
>> For generators of synthetic data, I had a couple ideas (and this will
>> show my limited knowledge about Flink and Streaming, but oh well):
>>
>> - Flink experts: Is it possible to add a pure-Beam generator that will do
>> something like: Impulse -> ParDo(generate multiple elements) -> Forced
>> "Write" to Flink (e.g. something like a reshuffle), and then have Flink
>> manage the parallelism for stages downstream from that?
>>
>> - If this is not possible, it may be worth writing some transform in
>> Flink / other runners that can be plugged in by inserting a custom URN. In
>> fact, it may be a good idea to streamline the insertion of native sources
>> for each runner based on some sort of CustomURNTransform() ?
>>
>> I hope I did not butcher those explanations too badly...
>> Best
>> -P.
>>
>> On Thu, Sep 27, 2018, 5:55 PM Thomas Weise <[email protected]> wrote:
>>
>>> There were a few discussions how we can facilitate testing for portable
>>> streaming pipelines with the Flink runner. The problem is that we currently
>>> don't have streaming sources in the Python SDK.
>>>
>>> One way to support testing could be a generator that extends the idea of
>>> Impulse to provide a Flink native trigger transform, optionally
>>> parameterized with an interval and max count.
>>>
>>> Test pipelines could then follow the generator with a Map function that
>>> creates whatever payloads are desirable.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Thomas
>>>
>>>

Reply via email to