Thanks all for the votes. So far, we have
- 4 binding +1 votes (Till, Andrey, Gary and Kurt) - 1 un-binding +1 votes (Xintong) - No -1 votes There are more than 3 binding +1 votes and no -1 votes, and the voting time has past. According to the community bylaws, I'm glad to announce that FLIP-56 is approved to be adopted by Apache Flink. Thank you~ Xintong Song On Tue, Sep 24, 2019 at 7:17 PM Xintong Song <[email protected]> wrote: > Thanks for the votes, Gary and Kurt. > > @Kurt > Sorry for the confusion. I've added a clarification in the section > "Unknown Resource Requirement". > > And +1 (non-binding) from my side. > > Thank you~ > > Xintong Song > > > > On Tue, Sep 24, 2019 at 5:35 PM Kurt Young <[email protected]> wrote: > >> If it's possible, I would suggest to add one sector in this doc to >> emphasize that current design has a prerequisite that each job >> should either has all its operators using unknown resource >> profile or all using specified amount of resource. This would >> make this document easier to understand. >> >> (I was confused by it and realized this after talking to Xingtong >> offline) >> >> But still I would +1 for this. >> >> Best, >> Kurt >> >> >> On Mon, Sep 23, 2019 at 10:18 PM Till Rohrmann <[email protected]> >> wrote: >> >> > Thanks for updating the Flip. It looks good to me. >> > >> > +1 (binding) >> > >> > Cheers, >> > Till >> > >> > On Mon, Sep 23, 2019 at 4:12 PM Xintong Song <[email protected]> >> > wrote: >> > >> > > @Till @Andrey >> > > >> > > According to the comments, I just updated the FLIP document [1], with >> the >> > > following changes: >> > > >> > > - Remove SlotID (in the section Protocol Changes) >> > > - Updated implementation steps to reduce separated code paths. As >> far >> > as >> > > I can see at the moment, we do not need the feature option. We can >> add >> > > it >> > > if later we find it necessary in the implementation. >> > > >> > > >> > > Thank you~ >> > > >> > > Xintong Song >> > > >> > > >> > > [1] >> > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation >> > > >> > > On Fri, Sep 20, 2019 at 11:01 AM Xintong Song <[email protected]> >> > > wrote: >> > > >> > > > I'm not sure if I understand the implementation plan you suggested >> > > > correctly. To my understanding, it seems that all the steps except >> for >> > > step >> > > > 5 have to happen in strict order. >> > > > >> > > > - Profiles to be used in step 2 is reported with step 1. >> > > > - SlotProfile in TaskExecutorGateway#requestSlot in step 3 comes >> > from >> > > > profiles used in step 2. >> > > > - Only if RM request slots from TM with profiles (step 3), would >> TM >> > be >> > > > able to do the proper bookkeeping (step 4) >> > > > - Step 5 can be done as long as we have step 2. >> > > > - Step 6 relies on both step 4 and step 5, for proper >> bookkeepings >> > on >> > > > both TM and RM sides before enabling non-default profiles. >> > > > >> > > > That means we can only work on the steps in the following order. >> > > > 1-2-3-4-6 >> > > > \-5-/ >> > > > >> > > > What I'm trying to achieve with the current plan, is to have most of >> > the >> > > > implementation steps paralleled, as the following. So that Andrey >> and I >> > > can >> > > > work concurrently without blocking each other too much. >> > > > 1-2-3-4 >> > > > \5-6-7 >> > > > >> > > > >> > > > I also agree that it would be good to not add too much separate >> codes. >> > I >> > > > would suggest leave that decision to the implementation time. E.g., >> if >> > by >> > > > the time we do the TM side bookkeeping, the RM side has already >> > > implemented >> > > > requesting slots with profiles, then we do not need to separate the >> > code >> > > > paths. >> > > > >> > > > >> > > > To that end, I think it makes sense to adjust step 5-7 to first use >> > > > default slot resource profiles for all the bookkeepings, and >> replace it >> > > > with the requested profiles at the end. >> > > > >> > > > >> > > > What do you think? >> > > > >> > > > >> > > > Thank you~ >> > > > >> > > > Xintong Song >> > > > >> > > > >> > > > >> > > > On Thu, Sep 19, 2019 at 7:59 PM Till Rohrmann <[email protected] >> > >> > > > wrote: >> > > > >> > > >> I think besides of point 1. and 3. there are no dependencies >> between >> > the >> > > >> RM >> > > >> and TM side changes. Also, I'm not sure whether it makes sense to >> > split >> > > >> the >> > > >> slot manager changes up into the proposed steps 5, 6 and 7. >> > > >> >> > > >> I would highly recommend to not add too much duplicate >> logic/separate >> > > code >> > > >> paths because it just adds blind spots which are probably not as >> well >> > > >> tested as the old code paths. >> > > >> >> > > >> Cheers, >> > > >> Till >> > > >> >> > > >> On Thu, Sep 19, 2019 at 11:58 AM Xintong Song < >> [email protected]> >> > > >> wrote: >> > > >> >> > > >> > Thanks for the comments, Till. >> > > >> > >> > > >> > - Agree on removing SlotID. >> > > >> > >> > > >> > - Regarding the implementation plan, it is true that we can >> possibly >> > > >> reduce >> > > >> > codes separated by the feature option. But I think to do that we >> > need >> > > to >> > > >> > introduce more dependencies between implementation steps. With >> the >> > > >> current >> > > >> > plan, we can easily separate steps on the RM side and the TM >> side, >> > and >> > > >> > start concurrently working on them after quickly updating the >> > > >> interfaces in >> > > >> > between. The feature will come alive when the steps on both RM/TM >> > > sides >> > > >> are >> > > >> > finished. Since we are planning to have two persons (Andrey and >> I) >> > > >> working >> > > >> > on this FLIP, I think the current plan is probably more >> convenient. >> > > >> > >> > > >> > Thank you~ >> > > >> > >> > > >> > Xintong Song >> > > >> > >> > > >> > >> > > >> > >> > > >> > On Thu, Sep 19, 2019 at 5:09 PM Till Rohrmann < >> [email protected] >> > > >> > > >> > wrote: >> > > >> > >> > > >> > > Hi Xintong, >> > > >> > > >> > > >> > > thanks for starting the vote. The general plan looks good. >> Hence >> > +1 >> > > >> from >> > > >> > my >> > > >> > > side. I still have some minor comments one could think about: >> > > >> > > >> > > >> > > * As we no longer have predetermined slots on the >> TaskExecutor, I >> > > >> think >> > > >> > we >> > > >> > > can get rid of the SlotID. Instead, an allocated slot will be >> > > >> identified >> > > >> > by >> > > >> > > the AllocationID and the TaskManager's ResourceID in order to >> > > >> > differentiate >> > > >> > > duplicate registrations. >> > > >> > > * For the implementation plan, I believe there is only one tiny >> > part >> > > >> on >> > > >> > the >> > > >> > > SlotManager for which we need a separate code path/feature flag >> > > which >> > > >> is >> > > >> > > how we find a matching slot. Everything else should be >> possible to >> > > >> > > implement in a way that it works with dynamic and static slot >> > > >> allocation: >> > > >> > > 1. Let TMs register with default slot profile at RM >> > > >> > > 2. Change SlotManager to use reported slot profiles instead of >> > > >> > > pre-calculated profiles >> > > >> > > 3. Replace SlotID with SlotProfile in >> > > TaskExecutorGateway#requestSlot >> > > >> > > 4. Extend TM to support dynamic slot allocation (aka proper >> > > >> bookkeeping) >> > > >> > > (can happen concurrently to any of steps 2-3) >> > > >> > > 5. Add bookkeeping to SlotManager (for pending TMs and >> registered >> > > TMs) >> > > >> > but >> > > >> > > still only use default slot profiles for matching with slot >> > requests >> > > >> > > 6. Allow to match slot requests with reported resources >> instead of >> > > >> > default >> > > >> > > slot profiles (here we could use a feature flag to switch >> between >> > > >> dynamic >> > > >> > > and static slot allocation) >> > > >> > > >> > > >> > > Wdyt? >> > > >> > > >> > > >> > > Cheers, >> > > >> > > Till >> > > >> > > >> > > >> > > On Thu, Sep 19, 2019 at 9:45 AM Andrey Zagrebin < >> > > [email protected] >> > > >> > >> > > >> > > wrote: >> > > >> > > >> > > >> > > > Hi Xintong, >> > > >> > > > >> > > >> > > > Thanks for starting the vote, +1 from my side. >> > > >> > > > >> > > >> > > > Best, >> > > >> > > > Andrey >> > > >> > > > >> > > >> > > > On Tue, Sep 17, 2019 at 4:26 PM Xintong Song < >> > > [email protected] >> > > >> > >> > > >> > > > wrote: >> > > >> > > > >> > > >> > > > > Hi all, >> > > >> > > > > >> > > >> > > > > I would like to start the vote for FLIP-56 [1], on which a >> > > >> consensus >> > > >> > is >> > > >> > > > > reached in this discussion thread [2]. >> > > >> > > > > >> > > >> > > > > The vote will be open for at least 72 hours. I'll try to >> close >> > > it >> > > >> > after >> > > >> > > > > Sep. 20 15:00 UTC, unless there is an objection or not >> enough >> > > >> votes. >> > > >> > > > > >> > > >> > > > > Thank you~ >> > > >> > > > > >> > > >> > > > > Xintong Song >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > [1] >> > > >> > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation >> > > >> > > > > >> > > >> > > > > [2] >> > > >> > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-56-Dynamic-Slot-Allocation-td31960.html >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > >> > >> >
