Reviving this thread. I think SDF is a pretty big risk for Spark runner
streaming. Holden, is it correct that Spark appears to have no way at all
to produce an infinite DStream from a finite RDD? Maybe we can somehow
dynamically create a new DStream for every initial restriction, said
DStream being obtained using a Receiver that under the hood actually runs
the SDF? (this is of course less efficient than a timer-capable runner
would do, and I have doubts about the fault tolerance)

On Wed, Mar 14, 2018 at 10:26 PM Reuven Lax <re...@google.com> wrote:

> How would timers be implemented? By outputing and reprocessing, the same
> way you proposed for SDF?
>
>
> On Wed, Mar 14, 2018 at 7:25 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> So the timers would have to be in our own code.
>>
>> On Wed, Mar 14, 2018 at 5:18 PM Eugene Kirpichov <kirpic...@google.com>
>> wrote:
>>
>>> Does Spark have support for timers? (I know it has support for state)
>>>
>>> On Wed, Mar 14, 2018 at 4:43 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Could we alternatively use a state mapping function to keep track of
>>>> the computation so far instead of outputting V each time? (also the
>>>> progress so far is probably of a different type R rather than V).
>>>>
>>>>
>>>> On Wed, Mar 14, 2018 at 4:28 PM Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>> So we had a quick chat about what it would take to add something like
>>>>> SplittableDoFns to Spark. I'd done some sketchy thinking about this last
>>>>> year but didn't get very far.
>>>>>
>>>>> My back-of-the-envelope design was as follows:
>>>>> For input type T
>>>>> Output type V
>>>>>
>>>>> Implement a mapper which outputs type (T, V)
>>>>> and if the computation finishes T will be populated otherwise V will be
>>>>>
>>>>> For determining how long to run we'd up to either K seconds or listen
>>>>> for a signal on a port
>>>>>
>>>>> Once we're done running we take the result and filter for the ones
>>>>> with T and V into seperate collections re-run until finished
>>>>> and then union the results
>>>>>
>>>>>
>>>>> This is maybe not a great design but it was minimally complicated and
>>>>> I figured terrible was a good place to start and improve from.
>>>>>
>>>>>
>>>>> Let me know your thoughts, especially the parts where this is worse
>>>>> than I remember because its been awhile since I thought about this.
>>>>>
>>>>>
>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>
>>>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>

Reply via email to