That makes sense. I suggest we add one note to the KIP to avoid confusion

On Wed, Sep 18, 2019 at 9:51 AM Xintong Song <tonysong...@gmail.com> wrote:

> @tao
>
> I think we cannot limit the cpu usage of a slot, nor isolate the usages
> between slots. We do have cpu limits for the task executor in some
> scenarios, such as on yarn with strict cgroup mode.
>
> The purpose of bookkeep and dynamic allocation of cpu cores is to prevent
> scheduling tasks with too many computation loads to the task executor,
> rather than limit the cpu usage of each slot.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Sep 18, 2019 at 12:18 AM tao xiao <xiaotao...@gmail.com> wrote:
>
> > Sorry if I ask a question that has been addressed before. please point me
> > to the reference.
> >
> > How do we limit the cpu usage to a slot?  Does the thread that executes
> the
> > slot get paused when it uses CPU cycles more than it requests?
> >
> > On Tue, Sep 17, 2019 at 10:23 PM Xintong Song <tonysong...@gmail.com>
> > wrote:
> >
> > > Thanks for the feedback, Andrey.
> > >
> > > I'll start the vote.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Sep 17, 2019 at 10:09 PM Andrey Zagrebin <azagre...@apache.org
> >
> > > wrote:
> > >
> > > > Thanks for the update @Xintong.
> > > > I would be ok with starting the vote.
> > > >
> > > > Best,
> > > > Andrey
> > > >
> > > > On Tue, Sep 17, 2019 at 6:12 AM Xintong Song <tonysong...@gmail.com>
> > > > wrote:
> > > >
> > > > > The implementation plan [1] is updated, with the following changes:
> > > > >
> > > > >    - Add default slot resource profile to
> > > > >    ResourceManagerGateway#registerTaskExecutor rather than
> > > > #sendSlotReport.
> > > > >    - Swap 'TaskExecutor derive and register with default slot
> > resource
> > > > >    profile' and 'Extend TaskExecutor to support dynamic slot
> > > allocation'
> > > > >    - Add step for updating RestAPI / Web UI
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >
> > > > > On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <
> tonysong...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > @Till
> > > > > > Thanks for the reminding. I'll add a step for updating the web
> ui.
> > > I'll
> > > > > > try to involve Lining to help us with this step.
> > > > > >
> > > > > > @Andrey
> > > > > > I was thinking that after we define the RM-TM interfaces in step
> 2,
> > > it
> > > > > > would be good to concurrently work on both RM and TM side. But
> yes,
> > > if
> > > > we
> > > > > > finish Step 4 early, then it would make step 6 easier. We can
> start
> > > to
> > > > > have
> > > > > > some IT/E2E tests, with the default slot resource profiles being
> > > > > available.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <
> > > and...@ververica.com>
> > > > > > wrote:
> > > > > >
> > > > > >> @Xintong
> > > > > >>
> > > > > >> Thanks for the feedback.
> > > > > >>
> > > > > >> Just to clarify step 6:
> > > > > >> If the first point is done before step 5 (e.g. as part of 4)
> then
> > it
> > > > is
> > > > > >> just keeping the info about the default slot in RM's data
> > structure
> > > > > >> associated the TM and no real change in the behaviour.
> > > > > >> When this info is available, I think it can be straightforwardly
> > > used
> > > > > >> during step 5 where we get either concrete slot requirement
> > > > > >> or the unknown one (step 6, point 2) which simply grabs some of
> > the
> > > > > >> concrete default ones (btw not clear which one, seems just some
> > > > random?)
> > > > > >>
> > > > > >> For steps 5,7, true, it is not quite clear whether we can avoid
> > some
> > > > > >> split,
> > > > > >> e.g. after step 5 before doing step 7.
> > > > > >> I agree that we should introduce the feature flag if we clearly
> > see
> > > > that
> > > > > >> it
> > > > > >> would be a bigger effort without the flag.
> > > > > >>
> > > > > >> Best,
> > > > > >> Andrey
> > > > > >>
> > > > > >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <
> > trohrm...@apache.org
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > One thing which was briefly mentioned in the Flip but not in
> the
> > > > > >> > implementation plan is the update of the web UI. I think it is
> > > worth
> > > > > >> > putting an extra item for updating the web UI to properly
> > display
> > > > the
> > > > > >> > resources a TM has still to offer with dynamic slot
> allocation.
> > I
> > > > > guess
> > > > > >> we
> > > > > >> > need to pull in some JavaScript help in order to implement
> this
> > > > step.
> > > > > >> >
> > > > > >> > Cheers,
> > > > > >> > Till
> > > > > >> >
> > > > > >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <
> > > tonysong...@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Thanks for the comments, Andrey.
> > > > > >> > >
> > > > > >> > > - I agree that instead of
> > ResourceManagerGateway#sendSlotReport,
> > > > we
> > > > > >> > should
> > > > > >> > > add the default slot resource profile to
> > > > > >> > > ResourceManagerGateway#registerTaskExecutor.
> > > > > >> > >
> > > > > >> > > - If I understand correctly, the reason you suggest do
> default
> > > > slot
> > > > > >> > > resource profile first and then do step 3 in a way that
> > support
> > > > both
> > > > > >> > > TaskExecutorGateway#requestSlot and
> > > > > >> TaskExecutorGateway#requestResource,
> > > > > >> > is
> > > > > >> > > to try to avoid splitting code paths with the feature
> option?
> > I
> > > > > think
> > > > > >> we
> > > > > >> > > can do that, but I also want to bring it up that this can
> only
> > > > > reduce
> > > > > >> the
> > > > > >> > > code split by the feature option (which is good) but not
> > > eliminate
> > > > > >> it. We
> > > > > >> > > still need the feature option for the fundamental
> differences,
> > > > e.g.
> > > > > >> > > creating new SlotIDs on allocation vs. allocate to free
> slots
> > > with
> > > > > >> > existing
> > > > > >> > > SlotIDs.
> > > > > >> > >
> > > > > >> > > - I don't really think we can do step 5, 6 and 7
> > independently.
> > > > > >> Basically
> > > > > >> > > they are all making changes to the same component. We
> probably
> > > can
> > > > > do
> > > > > >> > step
> > > > > >> > > 6 and 7 independently, but I think they both depends on step
> > 5.
> > > > > >> > >
> > > > > >> > > In general, I would say it's good to have as less as
> possible
> > > > codes
> > > > > >> split
> > > > > >> > > by the feature option, which makes the later clean-up
> easier.
> > > But
> > > > if
> > > > > >> it
> > > > > >> > > cannot be easily done, I would rather not to put too much
> > > efforts
> > > > on
> > > > > >> > having
> > > > > >> > > a good abstraction and deduplication between the new code
> path
> > > and
> > > > > the
> > > > > >> > > original one that we are removing soon.
> > > > > >> > >
> > > > > >> > > What do you think?
> > > > > >> > >
> > > > > >> > > Thank you~
> > > > > >> > >
> > > > > >> > > Xintong Song
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin <
> > > > > and...@ververica.com
> > > > > >> >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Xintong,
> > > > > >> > > >
> > > > > >> > > > Thanks for sharing the implementation steps. I also think
> > they
> > > > > makes
> > > > > >> > > sense
> > > > > >> > > > with the feature option.
> > > > > >> > > >
> > > > > >> > > > I was wondering if we could order the steps in a way that
> > each
> > > > > >> change
> > > > > >> > > does
> > > > > >> > > > not affect other components too much, always having a
> > working
> > > > > system
> > > > > >> > > > then maybe the feature option does not always need to
> split
> > > the
> > > > > >> code.
> > > > > >> > > Here
> > > > > >> > > > are some thoughts.
> > > > > >> > > >
> > > > > >> > > > - We could do default slot profile firstly and include it
> > into
> > > > the
> > > > > >> TM
> > > > > >> > > > registration. I would suggest to add
> > > > > >> > > > to ResourceManagerGateway#registerTaskExecutor, not
> > > > > sendSlotReport.
> > > > > >> > > >   This way RM knows about it but does not use at this
> point.
> > > > > (parts
> > > > > >> of
> > > > > >> > > step
> > > > > >> > > > 4,6)
> > > > > >> > > >
> > > > > >> > > > - We could try to do step 3 firstly in a way that it also
> > > > supports
> > > > > >> the
> > > > > >> > > > current way of allocation in
> TaskExecutorGateway#requestSlot
> > > > with
> > > > > >> the
> > > > > >> > > > default slot profile
> > > > > >> > > >   and sends reports both with available resources and with
> > > free
> > > > > >> default
> > > > > >> > > > slots which correspond to the available resources. We can
> > just
> > > > > >> remove
> > > > > >> > > free
> > > > > >> > > > default slots later.
> > > > > >> > > >   The new way of TaskExecutorGateway#requestResource could
> > be
> > > > also
> > > > > >> > > > implemented here but not used yet.
> > > > > >> > > >
> > > > > >> > > > - Then step 5 can use the new
> > > > TaskExecutorGateway#requestResource
> > > > > >> and
> > > > > >> > the
> > > > > >> > > > default slot profile
> > > > > >> > > >
> > > > > >> > > > - Not sure, step 5 and 7 can be implemented independently
> > > > without
> > > > > >> > > > regression of what we have. Maybe if we do step 7 firstly
> it
> > > > will
> > > > > >> have
> > > > > >> > > only
> > > > > >> > > > default slots firstly and it will simplify step 5 later.
> > > > > >> > > >
> > > > > >> > > > Best,
> > > > > >> > > > Andrey
> > > > > >> > > >
> > > > > >> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song <
> > > > > tonysong...@gmail.com
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Thanks for the comments, Till and Wenlong.
> > > > > >> > > > >
> > > > > >> > > > > @Wenlong
> > > > > >> > > > > Regarding slot sharing, the general idea is to request a
> > > slot
> > > > > with
> > > > > >> > > > > resources for tasks of the entire slot sharing group.
> > > Details
> > > > > can
> > > > > >> be
> > > > > >> > > > found
> > > > > >> > > > > in FLIP-53 [1], regarding how to decide the slot sharing
> > > > groups
> > > > > >> and
> > > > > >> > how
> > > > > >> > > > to
> > > > > >> > > > > manage task resources within the shared slots.
> > > > > >> > > > >
> > > > > >> > > > > Thank you~
> > > > > >> > > > >
> > > > > >> > > > > Xintong Song
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl <
> > > > > >> > wenlong88....@gmail.com>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for
> > the
> > > > > >> feature!
> > > > > >> > > It
> > > > > >> > > > is
> > > > > >> > > > > > something like mapreduce-1.0 to mapreduce-2.0.
> > > > > >> > > > > >
> > > > > >> > > > > > I like the design on the whole. One point may need to
> be
> > > > > >> included
> > > > > >> > in
> > > > > >> > > > the
> > > > > >> > > > > > proposal:How we deal with slot share group and dynamic
> > > slot
> > > > > >> > > allocation?
> > > > > >> > > > > It
> > > > > >> > > > > > can be quite different with dynamic slot allocation.
> > > > > >> > > > > >
> > > > > >> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann <
> > > > > >> trohrm...@apache.org>
> > > > > >> > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Thanks for the update Xintong. From a high level
> > > > perspective
> > > > > >> the
> > > > > >> > > > > > > implementation plan looks good to me.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Cheers,
> > > > > >> > > > > > > Till
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song <
> > > > > >> > > tonysong...@gmail.com
> > > > > >> > > > >
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Added implementation steps for this FLIP on the
> wiki
> > > > page
> > > > > >> [1].
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thank you~
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Xintong Song
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > [1]
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song <
> > > > > >> > > > tonysong...@gmail.com>
> > > > > >> > > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > @Zili
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > As far as I know, Timo is drafting a FLIP that
> has
> > > > taken
> > > > > >> the
> > > > > >> > > > number
> > > > > >> > > > > > 55.
> > > > > >> > > > > > > > > There is a round-up number maintained on the
> FLIP
> > > wiki
> > > > > >> page
> > > > > >> > [1]
> > > > > >> > > > > shows
> > > > > >> > > > > > > > > which number should be used for the new FLIP,
> > which
> > > > > >> should be
> > > > > >> > > > > > increased
> > > > > >> > > > > > > > by
> > > > > >> > > > > > > > > whoever takes the number for a new FLIP.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thank you~
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Xintong Song
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > [1]
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen <
> > > > > >> > > wander4...@gmail.com>
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >> We suddenly skipped FLIP-55 lol.
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> Xintong Song <tonysong...@gmail.com>
> > 于2019年8月19日周一
> > > > > >> > 下午10:23写道:
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >> > Hi everyone,
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > We would like to start a discussion thread on
> > > > > "FLIP-56:
> > > > > >> > > > Dynamic
> > > > > >> > > > > > Slot
> > > > > >> > > > > > > > >> > Allocation" [1]. This is originally part of
> the
> > > > > >> discussion
> > > > > >> > > > > thread
> > > > > >> > > > > > > for
> > > > > >> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management"
> > [2].
> > > As
> > > > > >> Till
> > > > > >> > > > > > suggested,
> > > > > >> > > > > > > we
> > > > > >> > > > > > > > >> > would like split the original discussion into
> > two
> > > > > >> topics,
> > > > > >> > > and
> > > > > >> > > > > > start
> > > > > >> > > > > > > a
> > > > > >> > > > > > > > >> > separate new discussion thread as well as
> FLIP
> > > > > process
> > > > > >> for
> > > > > >> > > > this
> > > > > >> > > > > > one.
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Thank you~
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > Xintong Song
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > [1]
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> > [2]
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html
> > > > > >> > > > > > > > >> >
> > > > > >> > > > > > > > >>
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > Tao
> >
>


-- 
Regards,
Tao

Reply via email to