1996fanrui commented on code in PR #726:
URL:
https://github.com/apache/flink-kubernetes-operator/pull/726#discussion_r1423346604
##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java:
##########
@@ -222,6 +222,18 @@ private static ConfigOptions.OptionBuilder
autoScalerConfig(String key) {
.withDescription(
"Processing rate increase threshold for detecting
ineffective scaling threshold. 0.1 means if we do not accomplish at least 10%
of the desired capacity increase with scaling, the action is marked
ineffective.");
+ public static final ConfigOption<Double> GC_PRESSURE_THRESHOLD =
+ autoScalerConfig("memory.gc-pressure.threshold")
+ .doubleType()
+ .defaultValue(0.3)
+ .withDescription("Max allowed GC pressure during scaling
operations");
+
+ public static final ConfigOption<Double> HEAP_USAGE_THRESHOLD =
+ autoScalerConfig("memory.heap-usage.threshold")
+ .doubleType()
+ .defaultValue(0.9)
Review Comment:
> Also keep in mind that this is the average heap usage. With 90% average
usage you are extremely likely to be close to out of heap in most cases.
Thanks @gyfora for the clarification!
I guess it's not average heap usage, and I wanna check with you first. In
the `ScalingExecutor#isJobUnderMemoryPressure` method, we check whether
`evaluatedMetrics.get(ScalingMetric.HEAP_USAGE).getAverage()` >
`conf.get(AutoScalerOptions.HEAP_USAGE_THRESHOLD)`. Intuitively `getAverage`
looks like the average, but its calculation is divided into two steps:
- Step1: `ScalingMetrics#computeGlobalMetrics` collect the `HEAP_USAGE` for
each time, it's `heapUsed.getMax() / heapMax.getMax()`.
- IIUC, the `heapUsed` is `AggregatedMetric`, when one job has 1000
taskmanagers, if the heapUsed for 999 tms is very low, and only one tm is high,
we think `heapUsed` is high as this time.
- Step2: `ScalingMetricEvaluator#evaluateGlobalMetrics` compute the
`HEAP_USAGE` based on `metricHistory`.
- The `metricHistory` is composed of TMs with the highest heapUsage at a
large number of time points.
Strictly speaking, both of 2 steps have some problems:
- Step1: Java GC is executed lazily, not immediately.
- When TM heapUsage is high, it may be that the GC has not been
triggered, which does not mean that the memory pressure is high.
- Especially if the heapUsage is high for only one TM or a small number
of TMs.
- Step2: Since the data in the first step is unreliable, the average value
in the second step is unreliable.
> GC metrics will only be available in Flink 1.19.
I'm not sure can we sum all GC times as the total gc times? Before 1.19, it
has detailed GC times for each GC.
> This is a very good point and happens often. I think we could definitely
build this logic on top of the newly introduced metrics + scaling history as a
follow up. It would probably be a very good addition. (but definitely out of
scope for this PR)
Sounds make sense, as I understand: it's better to revert this scaling if
job is unhealthy after scale down. The memory pressure is one type of
unhealthy. Checkpoint fails or CPU pressure may be unhealthy as well.
Would you mind if I create one JIRA and pick it up? Thanks~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]