Here is the link to join the discussion: https://meet.google.com/idc-japs-hwf Remember that it is this Friday Sept 14th from 11am-noon PST.
On Mon, Sep 10, 2018 at 7:30 AM Maximilian Michels <[email protected]> wrote: > Thanks for moving forward with this, Lukasz! > > Unfortunately, can't make it on Friday but I'll sync with somebody on > the call (e.g. Ryan) about your discussion. > > On 08.09.18 02:00, Lukasz Cwik wrote: > > Thanks for everyone who wanted to fill out the doodle poll. The most > > popular time was Friday Sept 14th from 11am-noon PST. I'll send out a > > calendar invite and meeting link early next week. > > > > I have received a lot of feedback on the document and have addressed > > some parts of it including: > > * clarifying terminology > > * processing skew due to some restrictions having their watermarks much > > further behind then others affecting scheduling of bundles by runners > > * external throttling & I/O wait overhead reporting to make sure we > > don't overscale > > > > Areas that still need additional feedback and details are: > > * reporting progress around the work that is done and is active > > * more examples > > * unbounded restrictions being caused by an unbounded number of splits > > of existing unbounded restrictions (infinite work growth) > > * whether we should be reporting this information at the PTransform > > level or at the bundle level > > > > > > > > On Wed, Sep 5, 2018 at 1:53 PM Lukasz Cwik <[email protected] > > <mailto:[email protected]>> wrote: > > > > Thanks to all those who have provided interest in this topic by the > > questions they have asked on the doc already and for those > > interested in having this discussion. I have setup this doodle to > > allow people to provide their availability: > > https://doodle.com/poll/nrw7w84255xnfwqy > > > > I'll send out the chosen time based upon peoples availability and a > > Hangout link by end of day Friday so please mark your availability > > using the link above. > > > > The agenda of the meeting will be as follows: > > * Overview of the proposal > > * Enumerate and discuss/answer questions brought up in the meeting > > > > Note that all questions and any discussions/answers provided will be > > added to the doc for those who are unable to attend. > > > > On Fri, Aug 31, 2018 at 9:47 AM Jean-Baptiste Onofré > > <[email protected] <mailto:[email protected]>> wrote: > > > > +1 > > > > Regards > > JB > > Le 31 août 2018, à 18:22, Lukasz Cwik <[email protected] > > <mailto:[email protected]>> a écrit: > > > > That is possible, I'll take people's date/time suggestions > > and create a simple online poll with them. > > > > On Fri, Aug 31, 2018 at 2:22 AM Robert Bradshaw > > <[email protected] <mailto:[email protected]>> wrote: > > > > Thanks for taking this up. I added some comments to the > > doc. A European-friendly time for discussion would > > be great. > > > > On Fri, Aug 31, 2018 at 3:14 AM Lukasz Cwik > > <[email protected] <mailto:[email protected]>> wrote: > > > > I came up with a proposal[1] for a progress model > > solely based off of the backlog and that splits > > should be based upon the remaining backlog we want > > the SDK to split at. I also give recommendations to > > runner authors as to how an autoscaling system could > > work based upon the measured backlog. A lot of > > discussions around progress reporting and splitting > > in the past has always been around finding an > > optimal solution, after reading a lot of information > > about work stealing, I don't believe there is a > > general solution and it really is upto > > SplittableDoFns to be well behaved. I did not do > > much work in classifying what a well behaved > > SplittableDoFn is though. Much of this work builds > > off ideas that Eugene had documented in the past[2]. > > > > I could use the communities wide knowledge of > > different I/Os to see if computing the backlog is > > practical in the way that I'm suggesting and to > > gather people's feedback. > > > > If there is a lot of interest, I would like to hold > > a community video conference between Sept 10th and > > 14th about this topic. Please reply with your > > availability by Sept 6th if your interested. > > > > 1: > https://s.apache.org/beam-bundles-backlog-splitting > > 2: https://s.apache.org/beam-breaking-fusion > > > > On Mon, Aug 13, 2018 at 10:21 AM Jean-Baptiste > > Onofré <[email protected] <mailto:[email protected]>> > wrote: > > > > Awesome ! > > > > Thanks Luke ! > > > > I plan to work with you and others on this one. > > > > Regards > > JB > > Le 13 août 2018, à 19:14, Lukasz Cwik > > <[email protected] <mailto:[email protected]>> a > > écrit: > > > > I wanted to reach out that I will be > > continuing from where Eugene left off with > > SplittableDoFn. I know that many of you have > > done a bunch of work with IOs and/or runner > > integration for SplittableDoFn and would > > appreciate your help in advancing this > > awesome idea. If you have questions or > > things you want to get reviewed related to > > SplittableDoFn, feel free to send them my > > way or include me on anything SplittableDoFn > > related. > > > > I was part of several discussions with > > Eugene and I think the biggest outstanding > > design portion is to figure out how dynamic > > work rebalancing would play out with the > > portability APIs. This includes reporting of > > progress from within a bundle. I know that > > Eugene had shared some documents in this > > regard but the position / split models > > didn't work too cleanly in a unified sense > > for bounded and unbounded SplittableDoFns. > > It will likely take me awhile to gather my > > thoughts but could use your expertise as to > > how compatible these ideas are with respect > > to to IOs and runners > > Flink/Spark/Dataflow/Samza/Apex/... and > > obviously help during implementation. > > >
