> If not, what is the difference between the spare resources and redundant
taskmanagers?

I wasn't aware of this one; good catch! The main difference is that you
don't express the spare resources in terms of slots but in terms of task
managers. Also, those options serve slightly different purpose, and users
configuring slot manager might not look for another option somewhere else.

> Secondly, IMHO the difference between min-reserved resource and spare
resources is that we could configure a rather large min-reserved resource

Agreed; in my mind, this boils down to the ability to quickly allocate new
slots (TMs). This might differ between environments though. In most cases,
there should be some time between interactive queries unless they're
submitted programmatically. I can see the value of having both (min + slots
to keep around).

All in all, I don't have a strong opinion here, it's a significant
improvement either way. This was just the first thing that I thought about
after reading the flip.

Best,
D.

On Tue, Oct 3, 2023 at 2:10 PM xiangyu feng <xiangyu...@gmail.com> wrote:

> Hi David,
>
> Thx for your feedback.
>
> First of all, for keeping some spare resources around, do you mean
> 'Redundant TaskManagers'[1]? If not, what is the difference between the
> spare resources and redundant taskmanagers?
>
> Secondly, IMHO the difference between min-reserved resource and spare
> resources is that we could configure a rather large min-reserved resource
> for user cases submitting lots of short-lived jobs concurrently, but we
> don't want to configure a large spare resource since this might double the
> total resource usage and lead to resource waste.
>
> Looking forward to hearing from you.
>
> Regards,
> Xiangyu
>
> [1] https://issues.apache.org/jira/browse/FLINK-18625
>
> David Morávek <d...@apache.org> 于2023年10月3日周二 05:00写道:
>
> > H Xiangyui,
> >
> > The sentiment of the FLIP makes sense, but I keep wondering whether this
> > is the best way to think about the problem. I assume that "interactive
> > session cluster" users always want to keep some spare resources around
> (up
> > to a configured threshold) to reduce cold start instead of statically
> > configuring the minimum.
> >
> > It's just a tiny change from the original proposal, but it could make all
> > the difference (eliminate overprovisioning, maintain latencies with a
> > growing # of jobs, ..)
> >
> > WDYT?
> >
> > Best,
> > D.
> >
> > On Mon, Sep 25, 2023 at 5:11 PM Jing Ge <j...@ververica.com.invalid>
> > wrote:
> >
> >> Hi Yangze,
> >>
> >> Thanks for the clarification. The example of two batch jobs team up with
> >> one streaming job is interesting.
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Wed, Sep 20, 2023 at 7:19 PM Yangze Guo <karma...@gmail.com> wrote:
> >>
> >> > Thanks for the comments, Jing.
> >> >
> >> > > Will the minimum resource configuration also take effect for
> streaming
> >> > jobs in application mode?
> >> > > Since it is not recommended to configure
> >> slotmanager.number-of-slots.max
> >> > for streaming jobs, does it make sense to disable it for common
> >> streaming
> >> > jobs? At least disable the check for avoiding the oscillation?
> >> >
> >> > Yes. The minimum resource configuration will only disabled in
> >> > standalone cluster atm. I agree it make sense to disable it for a pure
> >> > streaming job, however:
> >> > - By default, the minimum resource is configured to 0. If users do not
> >> > proactively set it, either the oscillation check or the minimum
> >> > restriction can be considered as disabled.
> >> > - The minimum resource is a cluster-level configuration rather than a
> >> > job-level configuration. If a user has an application with two batch
> >> > jobs preceding the streaming job, they may also require this
> >> > configuration to accelerate the execution of batch jobs.
> >> >
> >> > WDYT?
> >> >
> >> > Best,
> >> > Yangze Guo
> >> >
> >> > On Thu, Sep 21, 2023 at 4:49 AM Jing Ge <j...@ververica.com.invalid>
> >> > wrote:
> >> > >
> >> > > Hi Xiangyu,
> >> > >
> >> > > Thanks for driving it! There is one thing I am not really sure if I
> >> > > understand you correctly.
> >> > >
> >> > > According to the FLIP: "The minimum resource limitation will be
> >> > implemented
> >> > > in the DefaultResourceAllocationStrategy of FineGrainedSlotManager.
> >> > >
> >> > > Each time when SlotManager needs to reconcile the cluster resources
> or
> >> > > fulfill job resource requirements, the
> >> DefaultResourceAllocationStrategy
> >> > > will check if the minimum resource requirement has been fulfilled.
> If
> >> it
> >> > is
> >> > > not, DefaultResourceAllocationStrategy will request new
> >> > PendingTaskManagers
> >> > > and FineGrainedSlotManager will allocate new worker resources
> >> > accordingly."
> >> > >
> >> > > "To avoid this oscillation, we need to check the worker number
> derived
> >> > from
> >> > > minimum and maximum resource configuration is consistent before
> >> starting
> >> > > SlotManager."
> >> > >
> >> > > Will the minimum resource configuration also take effect for
> streaming
> >> > jobs
> >> > > in application mode? Since it is not recommended to
> >> > > configure slotmanager.number-of-slots.max for streaming jobs, does
> it
> >> > make
> >> > > sense to disable it for common streaming jobs? At least disable the
> >> check
> >> > > for avoiding the oscillation?
> >> > >
> >> > > Best regards,
> >> > > Jing
> >> > >
> >> > >
> >> > > On Tue, Sep 19, 2023 at 4:58 PM Chen Zhanghao <
> >> zhanghao.c...@outlook.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Thanks for driving this, Xiangyu. We use Session clusters for
> quick
> >> SQL
> >> > > > debugging internally, and found cold-start job submission slow due
> >> to
> >> > lack
> >> > > > of the exact minimum resource reservation feature proposed here.
> >> This
> >> > > > should improve the experience a lot for running short lived-jobs
> in
> >> > session
> >> > > > clusters.
> >> > > >
> >> > > > Best,
> >> > > > Zhanghao Chen
> >> > > > ________________________________
> >> > > > 发件人: Yangze Guo <karma...@gmail.com>
> >> > > > 发送时间: 2023年9月19日 13:10
> >> > > > 收件人: xiangyu feng <xiangyu...@gmail.com>
> >> > > > 抄送: dev@flink.apache.org <dev@flink.apache.org>
> >> > > > 主题: Re: [Discuss] FLIP-362: Support minimum resource limitation
> >> > > >
> >> > > > Thanks for driving this @Xiangyu. This is a feature that many
> users
> >> > > > have requested for a long time. +1 for the overall proposal.
> >> > > >
> >> > > > Best,
> >> > > > Yangze Guo
> >> > > >
> >> > > > On Tue, Sep 19, 2023 at 11:48 AM xiangyu feng <
> xiangyu...@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > Hi Devs,
> >> > > > >
> >> > > > > I'm opening this thread to discuss FLIP-362: Support minimum
> >> resource
> >> > > > limitation. The design doc can be found at:
> >> > > > > FLIP-362: Support minimum resource limitation
> >> > > > >
> >> > > > > Currently, the Flink cluster only requests Task Managers (TMs)
> >> when
> >> > > > there is a resource requirement, and idle TMs are released after a
> >> > certain
> >> > > > period of time. However, in certain scenarios, such as running
> short
> >> > > > lived-jobs in session cluster and scheduling batch jobs stage by
> >> > stage, we
> >> > > > need to improve the efficiency of job execution by maintaining a
> >> > certain
> >> > > > number of available workers in the cluster all the time.
> >> > > > >
> >> > > > > After discussed with Yangze, we introduced this new feature. The
> >> new
> >> > > > added public options and proposed changes are described in this
> >> FLIP.
> >> > > > >
> >> > > > > Looking forward to your feedback, thanks.
> >> > > > >
> >> > > > > Best regards,
> >> > > > > Xiangyu
> >> > > > >
> >> > > >
> >> >
> >>
> >
>

Reply via email to