Thanks to all those who have provided interest in this topic by the questions they have asked on the doc already and for those interested in having this discussion. I have setup this doodle to allow people to provide their availability: https://doodle.com/poll/nrw7w84255xnfwqy
I'll send out the chosen time based upon peoples availability and a Hangout link by end of day Friday so please mark your availability using the link above. The agenda of the meeting will be as follows: * Overview of the proposal * Enumerate and discuss/answer questions brought up in the meeting Note that all questions and any discussions/answers provided will be added to the doc for those who are unable to attend. On Fri, Aug 31, 2018 at 9:47 AM Jean-Baptiste Onofré <[email protected]> wrote: > +1 > > Regards > JB > Le 31 août 2018, à 18:22, Lukasz Cwik <[email protected]> a écrit: >> >> That is possible, I'll take people's date/time suggestions and create a >> simple online poll with them. >> >> On Fri, Aug 31, 2018 at 2:22 AM Robert Bradshaw <[email protected]> >> wrote: >> >>> Thanks for taking this up. I added some comments to the doc. A >>> European-friendly time for discussion would be great. >>> >>> On Fri, Aug 31, 2018 at 3:14 AM Lukasz Cwik <[email protected]> wrote: >>> >>>> I came up with a proposal[1] for a progress model solely based off of >>>> the backlog and that splits should be based upon the remaining backlog we >>>> want the SDK to split at. I also give recommendations to runner authors as >>>> to how an autoscaling system could work based upon the measured backlog. A >>>> lot of discussions around progress reporting and splitting in the past has >>>> always been around finding an optimal solution, after reading a lot of >>>> information about work stealing, I don't believe there is a general >>>> solution and it really is upto SplittableDoFns to be well behaved. I did >>>> not do much work in classifying what a well behaved SplittableDoFn is >>>> though. Much of this work builds off ideas that Eugene had documented in >>>> the past[2]. >>>> >>>> I could use the communities wide knowledge of different I/Os to see if >>>> computing the backlog is practical in the way that I'm suggesting and to >>>> gather people's feedback. >>>> >>>> If there is a lot of interest, I would like to hold a community video >>>> conference between Sept 10th and 14th about this topic. Please reply with >>>> your availability by Sept 6th if your interested. >>>> >>>> 1: https://s.apache.org/beam-bundles-backlog-splitting >>>> 2: https://s.apache.org/beam-breaking-fusion >>>> >>>> On Mon, Aug 13, 2018 at 10:21 AM Jean-Baptiste Onofré <[email protected]> >>>> wrote: >>>> >>>>> Awesome ! >>>>> >>>>> Thanks Luke ! >>>>> >>>>> I plan to work with you and others on this one. >>>>> >>>>> Regards >>>>> JB >>>>> Le 13 août 2018, à 19:14, Lukasz Cwik <[email protected]> a écrit: >>>>>> >>>>>> I wanted to reach out that I will be continuing from where Eugene >>>>>> left off with SplittableDoFn. I know that many of you have done a bunch >>>>>> of >>>>>> work with IOs and/or runner integration for SplittableDoFn and would >>>>>> appreciate your help in advancing this awesome idea. If you have >>>>>> questions >>>>>> or things you want to get reviewed related to SplittableDoFn, feel free >>>>>> to >>>>>> send them my way or include me on anything SplittableDoFn related. >>>>>> >>>>>> I was part of several discussions with Eugene and I think the biggest >>>>>> outstanding design portion is to figure out how dynamic work rebalancing >>>>>> would play out with the portability APIs. This includes reporting of >>>>>> progress from within a bundle. I know that Eugene had shared some >>>>>> documents >>>>>> in this regard but the position / split models didn't work too cleanly >>>>>> in a >>>>>> unified sense for bounded and unbounded SplittableDoFns. It will likely >>>>>> take me awhile to gather my thoughts but could use your expertise as to >>>>>> how >>>>>> compatible these ideas are with respect to to IOs and runners >>>>>> Flink/Spark/Dataflow/Samza/Apex/... and obviously help during >>>>>> implementation. >>>>>> >>>>>
