Thanks for everyone who wanted to fill out the doodle poll. The most
popular time was Friday Sept 14th from 11am-noon PST. I'll send out a
calendar invite and meeting link early next week.

I have received a lot of feedback on the document and have addressed some
parts of it including:
* clarifying terminology
* processing skew due to some restrictions having their watermarks much
further behind then others affecting scheduling of bundles by runners
* external throttling & I/O wait overhead reporting to make sure we don't
overscale

Areas that still need additional feedback and details are:
* reporting progress around the work that is done and is active
* more examples
* unbounded restrictions being caused by an unbounded number of splits of
existing unbounded restrictions (infinite work growth)
* whether we should be reporting this information at the PTransform level
or at the bundle level



On Wed, Sep 5, 2018 at 1:53 PM Lukasz Cwik <[email protected]> wrote:

> Thanks to all those who have provided interest in this topic by the
> questions they have asked on the doc already and for those interested in
> having this discussion. I have setup this doodle to allow people to provide
> their availability:
> https://doodle.com/poll/nrw7w84255xnfwqy
>
> I'll send out the chosen time based upon peoples availability and a
> Hangout link by end of day Friday so please mark your availability using
> the link above.
>
> The agenda of the meeting will be as follows:
> * Overview of the proposal
> * Enumerate and discuss/answer questions brought up in the meeting
>
> Note that all questions and any discussions/answers provided will be added
> to the doc for those who are unable to attend.
>
> On Fri, Aug 31, 2018 at 9:47 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
>
>> +1
>>
>> Regards
>> JB
>> Le 31 août 2018, à 18:22, Lukasz Cwik <[email protected]> a écrit:
>>>
>>> That is possible, I'll take people's date/time suggestions and create a
>>> simple online poll with them.
>>>
>>> On Fri, Aug 31, 2018 at 2:22 AM Robert Bradshaw <[email protected]>
>>> wrote:
>>>
>>>> Thanks for taking this up. I added some comments to the doc. A
>>>> European-friendly time for discussion would be great.
>>>>
>>>> On Fri, Aug 31, 2018 at 3:14 AM Lukasz Cwik <[email protected]> wrote:
>>>>
>>>>> I came up with a proposal[1] for a progress model solely based off of
>>>>> the backlog and that splits should be based upon the remaining backlog we
>>>>> want the SDK to split at. I also give recommendations to runner authors as
>>>>> to how an autoscaling system could work based upon the measured backlog. A
>>>>> lot of discussions around progress reporting and splitting in the past has
>>>>> always been around finding an optimal solution, after reading a lot of
>>>>> information about work stealing, I don't believe there is a general
>>>>> solution and it really is upto SplittableDoFns to be well behaved. I did
>>>>> not do much work in classifying what a well behaved SplittableDoFn is
>>>>> though. Much of this work builds off ideas that Eugene had documented in
>>>>> the past[2].
>>>>>
>>>>> I could use the communities wide knowledge of different I/Os to see if
>>>>> computing the backlog is practical in the way that I'm suggesting and to
>>>>> gather people's feedback.
>>>>>
>>>>> If there is a lot of interest, I would like to hold a community video
>>>>> conference between Sept 10th and 14th about this topic. Please reply with
>>>>> your availability by Sept 6th if your interested.
>>>>>
>>>>> 1: https://s.apache.org/beam-bundles-backlog-splitting
>>>>> 2: https://s.apache.org/beam-breaking-fusion
>>>>>
>>>>> On Mon, Aug 13, 2018 at 10:21 AM Jean-Baptiste Onofré <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Awesome !
>>>>>>
>>>>>> Thanks Luke !
>>>>>>
>>>>>> I plan to work with you and others on this one.
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>> Le 13 août 2018, à 19:14, Lukasz Cwik <[email protected]> a écrit:
>>>>>>>
>>>>>>> I wanted to reach out that I will be continuing from where Eugene
>>>>>>> left off with SplittableDoFn. I know that many of you have done a bunch 
>>>>>>> of
>>>>>>> work with IOs and/or runner integration for SplittableDoFn and would
>>>>>>> appreciate your help in advancing this awesome idea. If you have 
>>>>>>> questions
>>>>>>> or things you want to get reviewed related to SplittableDoFn, feel free 
>>>>>>> to
>>>>>>> send them my way or include me on anything SplittableDoFn related.
>>>>>>>
>>>>>>> I was part of several discussions with Eugene and I think the
>>>>>>> biggest outstanding design portion is to figure out how dynamic work
>>>>>>> rebalancing would play out with the portability APIs. This includes
>>>>>>> reporting of progress from within a bundle. I know that Eugene had 
>>>>>>> shared
>>>>>>> some documents in this regard but the position / split models didn't 
>>>>>>> work
>>>>>>> too cleanly in a unified sense for bounded and unbounded 
>>>>>>> SplittableDoFns.
>>>>>>> It will likely take me awhile to gather my thoughts but could use your
>>>>>>> expertise as to how compatible these ideas are with respect to to IOs 
>>>>>>> and
>>>>>>> runners Flink/Spark/Dataflow/Samza/Apex/... and obviously help during
>>>>>>> implementation.
>>>>>>>
>>>>>>

Reply via email to