Thanks to all those who have provided interest in this topic by the
questions they have asked on the doc already and for those interested in
having this discussion. I have setup this doodle to allow people to provide
their availability:
https://doodle.com/poll/nrw7w84255xnfwqy

I'll send out the chosen time based upon peoples availability and a Hangout
link by end of day Friday so please mark your availability using the link
above.

The agenda of the meeting will be as follows:
* Overview of the proposal
* Enumerate and discuss/answer questions brought up in the meeting

Note that all questions and any discussions/answers provided will be added
to the doc for those who are unable to attend.

On Fri, Aug 31, 2018 at 9:47 AM Jean-Baptiste Onofré <[email protected]>
wrote:

> +1
>
> Regards
> JB
> Le 31 août 2018, à 18:22, Lukasz Cwik <[email protected]> a écrit:
>>
>> That is possible, I'll take people's date/time suggestions and create a
>> simple online poll with them.
>>
>> On Fri, Aug 31, 2018 at 2:22 AM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> Thanks for taking this up. I added some comments to the doc. A
>>> European-friendly time for discussion would be great.
>>>
>>> On Fri, Aug 31, 2018 at 3:14 AM Lukasz Cwik <[email protected]> wrote:
>>>
>>>> I came up with a proposal[1] for a progress model solely based off of
>>>> the backlog and that splits should be based upon the remaining backlog we
>>>> want the SDK to split at. I also give recommendations to runner authors as
>>>> to how an autoscaling system could work based upon the measured backlog. A
>>>> lot of discussions around progress reporting and splitting in the past has
>>>> always been around finding an optimal solution, after reading a lot of
>>>> information about work stealing, I don't believe there is a general
>>>> solution and it really is upto SplittableDoFns to be well behaved. I did
>>>> not do much work in classifying what a well behaved SplittableDoFn is
>>>> though. Much of this work builds off ideas that Eugene had documented in
>>>> the past[2].
>>>>
>>>> I could use the communities wide knowledge of different I/Os to see if
>>>> computing the backlog is practical in the way that I'm suggesting and to
>>>> gather people's feedback.
>>>>
>>>> If there is a lot of interest, I would like to hold a community video
>>>> conference between Sept 10th and 14th about this topic. Please reply with
>>>> your availability by Sept 6th if your interested.
>>>>
>>>> 1: https://s.apache.org/beam-bundles-backlog-splitting
>>>> 2: https://s.apache.org/beam-breaking-fusion
>>>>
>>>> On Mon, Aug 13, 2018 at 10:21 AM Jean-Baptiste Onofré <[email protected]>
>>>> wrote:
>>>>
>>>>> Awesome !
>>>>>
>>>>> Thanks Luke !
>>>>>
>>>>> I plan to work with you and others on this one.
>>>>>
>>>>> Regards
>>>>> JB
>>>>> Le 13 août 2018, à 19:14, Lukasz Cwik <[email protected]> a écrit:
>>>>>>
>>>>>> I wanted to reach out that I will be continuing from where Eugene
>>>>>> left off with SplittableDoFn. I know that many of you have done a bunch 
>>>>>> of
>>>>>> work with IOs and/or runner integration for SplittableDoFn and would
>>>>>> appreciate your help in advancing this awesome idea. If you have 
>>>>>> questions
>>>>>> or things you want to get reviewed related to SplittableDoFn, feel free 
>>>>>> to
>>>>>> send them my way or include me on anything SplittableDoFn related.
>>>>>>
>>>>>> I was part of several discussions with Eugene and I think the biggest
>>>>>> outstanding design portion is to figure out how dynamic work rebalancing
>>>>>> would play out with the portability APIs. This includes reporting of
>>>>>> progress from within a bundle. I know that Eugene had shared some 
>>>>>> documents
>>>>>> in this regard but the position / split models didn't work too cleanly 
>>>>>> in a
>>>>>> unified sense for bounded and unbounded SplittableDoFns. It will likely
>>>>>> take me awhile to gather my thoughts but could use your expertise as to 
>>>>>> how
>>>>>> compatible these ideas are with respect to to IOs and runners
>>>>>> Flink/Spark/Dataflow/Samza/Apex/... and obviously help during
>>>>>> implementation.
>>>>>>
>>>>>

Reply via email to