Re: [DISCUSS] FLIP-271: Autoscaling

Pedro Silva Sat, 05 Nov 2022 11:19:10 -0700

@Guyla,

Thank you for your reply, the answer makes perfect sense. I have a follow up if 
that’s ok.


IIUC this FLIP uses metrics that relate to backpressure at an operator level 
(records in vs out, busy time etc…).
Could the FLIP also be used to auto-scale based on state-level metrics at an 
operator level? 

I.e: keyed state growing too large for a given operator and needing to be 
distributed across more instances of that operator.

Cheers,
Pedro

> 
> On 5 Nov 2022, at 09:47, Gyula Fóra <gyula.f...@gmail.com> wrote:
> 
> @JunRui:
> There are 2 pieces that prevent scaling on minor load variations. Firstly
> he algorithm / logic is intended to work on metrics averaged on a
> configured time window (let's say last 5 minutes), this smoothens minor
> variances and results in more stability. Secondly, in addition to the
> utilization target (let's say 75%) there would be a configurable flexible
> bound (for example +/- 10%) and as long as we are within the bound we
> wouldn't trigger scaling. These 2 together should be enough we believe.
> 
> @Pedro:
> Periodic metric collection is very important to get a good overview of the
> system at any given time. Also we need some recent history to make a good
> scaling decision. Reflecting on your suggestion, when the input lag
> increases we are already too late to scale. Ideally the algorithm would
> pick up on a gradual load increase before it even results in actual input
> lag so we can always keep our target utilization level.
> 
> Cheers,
> Gyula
> 
> 
> 
>> On Sat, Nov 5, 2022 at 5:16 PM Pedro Silva <pedro.cl...@gmail.com> wrote:
>> 
>> Hello,
>> 
>> First of all thank you for tackling this theme, it is massive boon to
>> Flink if it gets in.
>> 
>> Following up on JunRui Lee’s question.
>> Have you considered making metrics collection getting triggered based on
>> events rather than periodic checks?
>> 
>> I.e if input source lag is increasing for the past x amount of time ->
>> trigger a metric collection to understand what to scale, if anything.
>> 
>> For kubernetes loads there is KEDA that does this:
>> https://keda.sh/docs/2.8/scalers/prometheus/
>> 
>> My apologies if the question doesn’t make sense.
>> 
>> Thank you for your time,
>> Pedro Silva
>> 
>>> 
>>>> On 5 Nov 2022, at 08:09, JunRui Lee <jrlee....@gmail.com> wrote:
>>> 
>>> Hi Max,
>>> 
>>> Thanks for writing this FLIP and initiating the discussion.
>>> 
>>> I just have a small question after reading the FLIP:
>>> 
>>> In the document, I didn't find the definition of when to trigger
>>> autoScaling after some jobVertex reach the threshold. If I missed is,
>>> please let me know.
>>> IIUC, the proper triggering rules are necessary to avoid unnecessary
>>> autoscaling caused by temporary large changes in data,
>>> and in this case, it will lead to at least two meaningless resubmissions
>> of
>>> jobs, which will negatively affect users.
>>> 
>>> Thanks,
>>> JunRui Lee
>>> 
>>> Gyula Fóra <gyula.f...@gmail.com> 于2022年11月5日周六 20:38写道：
>>> 
>>>> Hey!
>>>> Thanks for the input!
>>>> The algorithm does not really differentiate between scaling up or down
>> as
>>>> it’s concerned about finding the right parallelism to match the target
>>>> processing rate with just enough spare capacity.
>>>> Let me try to address your specific points:
>>>> 1. The backlog growth rate only matters for computing the target
>> processing
>>>> rate for the sources. If the parallelism is high enough and there is no
>>>> back pressure it will be close to 0 so the target rate is the source
>> read
>>>> rate. This is as intended. If we see that the sources are not busy and
>> they
>>>> can read more than enough the algorithm would scale them down.
>>>> 2. You are right , it’s dangerous to scale in too much, so we already
>>>> thought about limiting the scale down amount per scaling step/time
>> window
>>>> to give more safety. But we can definitely think about different
>> strategies
>>>> in the future!
>>>> The observation regarding max parallelism is very important and we
>> should
>>>> always take that into consideration.
>>>> Cheers
>>>> Gyula
>>>>> On Sat, 5 Nov 2022 at 11:46, Biao Geng <biaoge...@gmail.com> wrote:
>>>>> Hi Max,
>>>>> Thanks a lot for the FLIP. It is an extremely attractive feature!
>>>>> Just some follow up questions/thoughts after reading the FLIP:
>>>>> In the doc, the discussion of  the strategy of “scaling out” is
>> thorough
>>>>> and convincing to me but it seems that “scaling down” is less
>> discussed.
>>>> I
>>>>> have 2 cents for this aspect:
>>>>> 1.  For source parallelisms, if the user configure a much larger value
>>>>> than normal, there should be very little pending records though it is
>>>>> possible to get optimized. But IIUC, in current algorithm, we will not
>>>> take
>>>>> actions for this case as the backlog growth rate is almost zero. Is the
>>>>> understanding right?
>>>>> 2.  Compared with “scaling out”, “scaling in” is usually more dangerous
>>>>> as it is more likely to lead to negative influence to the downstream
>>>> jobs.
>>>>> The min/max load bounds should be useful. I am wondering if it is
>>>> possible
>>>>> to have different strategy for “scaling in” to make it more
>> conservative.
>>>>> Or more eagerly, allow custom autoscaling strategy(e.g. time-based
>>>>> strategy).
>>>>> Another side thought is that to recover a job from
>> checkpoint/savepoint,
>>>>> the new parallelism cannot be larger than max parallelism defined in
>> the
>>>>> checkpoint(see this<
>>>> 
>> https://github.com/apache/flink/blob/17a782c202c93343b8884cb52f4562f9c4ba593f/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/Checkpoints.java#L128
>>>>> ).
>>>>> Not sure if this limit should be mentioned in the FLIP.
>>>>> Again, thanks for the great work and looking forward to using flink k8s
>>>>> operator with it!
>>>>> Best,
>>>>> Biao Geng
>>>>> From: Maximilian Michels <m...@apache.org>
>>>>> Date: Saturday, November 5, 2022 at 2:37 AM
>>>>> To: dev <dev@flink.apache.org>
>>>>> Cc: Gyula Fóra <gyula.f...@gmail.com>, Thomas Weise <t...@apache.org>,
>>>>> Marton Balassi <mbala...@apache.org>, Őrhidi Mátyás <
>>>>> matyas.orh...@gmail.com>
>>>>> Subject: [DISCUSS] FLIP-271: Autoscaling
>>>>> Hi,
>>>>> I would like to kick off the discussion on implementing autoscaling for
>>>>> Flink as part of the Flink Kubernetes operator. I've outlined an
>> approach
>>>>> here which I find promising:
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
>>>>> I've been discussing this approach with some of the operator
>>>> contributors:
>>>>> Gyula, Marton, Matyas, and Thomas (all in CC). We started prototyping
>> an
>>>>> implementation based on the current FLIP design. If that goes well, we
>>>>> would like to contribute this to Flink based on the results of the
>>>>> discussion here.
>>>>> I'm curious to hear your thoughts.
>>>>> -Max
>>

Re: [DISCUSS] FLIP-271: Autoscaling

Reply via email to