That is possible, I'll take people's date/time suggestions and create a
simple online poll with them.

On Fri, Aug 31, 2018 at 2:22 AM Robert Bradshaw <[email protected]> wrote:

> Thanks for taking this up. I added some comments to the doc. A
> European-friendly time for discussion would be great.
>
> On Fri, Aug 31, 2018 at 3:14 AM Lukasz Cwik <[email protected]> wrote:
>
>> I came up with a proposal[1] for a progress model solely based off of the
>> backlog and that splits should be based upon the remaining backlog we want
>> the SDK to split at. I also give recommendations to runner authors as to
>> how an autoscaling system could work based upon the measured backlog. A lot
>> of discussions around progress reporting and splitting in the past has
>> always been around finding an optimal solution, after reading a lot of
>> information about work stealing, I don't believe there is a general
>> solution and it really is upto SplittableDoFns to be well behaved. I did
>> not do much work in classifying what a well behaved SplittableDoFn is
>> though. Much of this work builds off ideas that Eugene had documented in
>> the past[2].
>>
>> I could use the communities wide knowledge of different I/Os to see if
>> computing the backlog is practical in the way that I'm suggesting and to
>> gather people's feedback.
>>
>> If there is a lot of interest, I would like to hold a community video
>> conference between Sept 10th and 14th about this topic. Please reply with
>> your availability by Sept 6th if your interested.
>>
>> 1: https://s.apache.org/beam-bundles-backlog-splitting
>> 2: https://s.apache.org/beam-breaking-fusion
>>
>> On Mon, Aug 13, 2018 at 10:21 AM Jean-Baptiste Onofré <[email protected]>
>> wrote:
>>
>>> Awesome !
>>>
>>> Thanks Luke !
>>>
>>> I plan to work with you and others on this one.
>>>
>>> Regards
>>> JB
>>> Le 13 août 2018, à 19:14, Lukasz Cwik <[email protected]> a écrit:
>>>>
>>>> I wanted to reach out that I will be continuing from where Eugene left
>>>> off with SplittableDoFn. I know that many of you have done a bunch of work
>>>> with IOs and/or runner integration for SplittableDoFn and would appreciate
>>>> your help in advancing this awesome idea. If you have questions or things
>>>> you want to get reviewed related to SplittableDoFn, feel free to send them
>>>> my way or include me on anything SplittableDoFn related.
>>>>
>>>> I was part of several discussions with Eugene and I think the biggest
>>>> outstanding design portion is to figure out how dynamic work rebalancing
>>>> would play out with the portability APIs. This includes reporting of
>>>> progress from within a bundle. I know that Eugene had shared some documents
>>>> in this regard but the position / split models didn't work too cleanly in a
>>>> unified sense for bounded and unbounded SplittableDoFns. It will likely
>>>> take me awhile to gather my thoughts but could use your expertise as to how
>>>> compatible these ideas are with respect to to IOs and runners
>>>> Flink/Spark/Dataflow/Samza/Apex/... and obviously help during
>>>> implementation.
>>>>
>>>

Reply via email to