+1 Regards JB
Le 31 août 2018 à 18:22, à 18:22, Lukasz Cwik <[email protected]> a écrit: >That is possible, I'll take people's date/time suggestions and create a >simple online poll with them. > >On Fri, Aug 31, 2018 at 2:22 AM Robert Bradshaw <[email protected]> >wrote: > >> Thanks for taking this up. I added some comments to the doc. A >> European-friendly time for discussion would be great. >> >> On Fri, Aug 31, 2018 at 3:14 AM Lukasz Cwik <[email protected]> wrote: >> >>> I came up with a proposal[1] for a progress model solely based off >of the >>> backlog and that splits should be based upon the remaining backlog >we want >>> the SDK to split at. I also give recommendations to runner authors >as to >>> how an autoscaling system could work based upon the measured >backlog. A lot >>> of discussions around progress reporting and splitting in the past >has >>> always been around finding an optimal solution, after reading a lot >of >>> information about work stealing, I don't believe there is a general >>> solution and it really is upto SplittableDoFns to be well behaved. I >did >>> not do much work in classifying what a well behaved SplittableDoFn >is >>> though. Much of this work builds off ideas that Eugene had >documented in >>> the past[2]. >>> >>> I could use the communities wide knowledge of different I/Os to see >if >>> computing the backlog is practical in the way that I'm suggesting >and to >>> gather people's feedback. >>> >>> If there is a lot of interest, I would like to hold a community >video >>> conference between Sept 10th and 14th about this topic. Please reply >with >>> your availability by Sept 6th if your interested. >>> >>> 1: https://s.apache.org/beam-bundles-backlog-splitting >>> 2: https://s.apache.org/beam-breaking-fusion >>> >>> On Mon, Aug 13, 2018 at 10:21 AM Jean-Baptiste Onofré ><[email protected]> >>> wrote: >>> >>>> Awesome ! >>>> >>>> Thanks Luke ! >>>> >>>> I plan to work with you and others on this one. >>>> >>>> Regards >>>> JB >>>> Le 13 août 2018, à 19:14, Lukasz Cwik <[email protected]> a écrit: >>>>> >>>>> I wanted to reach out that I will be continuing from where Eugene >left >>>>> off with SplittableDoFn. I know that many of you have done a bunch >of work >>>>> with IOs and/or runner integration for SplittableDoFn and would >appreciate >>>>> your help in advancing this awesome idea. If you have questions or >things >>>>> you want to get reviewed related to SplittableDoFn, feel free to >send them >>>>> my way or include me on anything SplittableDoFn related. >>>>> >>>>> I was part of several discussions with Eugene and I think the >biggest >>>>> outstanding design portion is to figure out how dynamic work >rebalancing >>>>> would play out with the portability APIs. This includes reporting >of >>>>> progress from within a bundle. I know that Eugene had shared some >documents >>>>> in this regard but the position / split models didn't work too >cleanly in a >>>>> unified sense for bounded and unbounded SplittableDoFns. It will >likely >>>>> take me awhile to gather my thoughts but could use your expertise >as to how >>>>> compatible these ideas are with respect to to IOs and runners >>>>> Flink/Spark/Dataflow/Samza/Apex/... and obviously help during >>>>> implementation. >>>>> >>>>
