pnowojski commented on code in PR #22787:
URL: https://github.com/apache/flink/pull/22787#discussion_r1243937642
##########
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskIOMetricGroup.java:
##########
@@ -182,14 +186,20 @@ public TimerGauge getHardBackPressuredTimePerSecond() {
return hardBackPressuredTimePerSecond;
}
+ public TimerGauge getIoBackPressuredTimePerSecond() {
+ return ioBackPressuredTimePerSecond;
+ }
+
public long getBackPressuredTimeMsPerSecond() {
return getSoftBackPressuredTimePerSecond().getValue()
- + getHardBackPressuredTimePerSecond().getValue();
+ + getHardBackPressuredTimePerSecond().getValue()
+ + getIoBackPressuredTimePerSecond().getValue();
Review Comment:
Should this be added up to the backpressured time? IMO it fits better to the
busyness.
If given subtask is spending tons of time waiting for the changelog, that's
problem of that particular subtask, not a downstream subtask's problem. And
that's more or less the distinction between busyness/backpressured
states/times. In other words, as it is proposed currently, large waiting time
for changelog would mess up with autoscalers. Subtask that's blocked waiting on
changelog should be scaled up, so this time should be reported as busy time imo.
Moreover, as it is right now, aren't you sometimes accounting changelog time
as backpressured (if this is "soft waiting for changelog" via availability
future) and sometimes as busy ("hard waiting for changelog" when actually
blocked in the changelog code).
##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##########
@@ -597,8 +597,7 @@ protected void processInput(MailboxDefaultAction.Controller
controller) throws E
resumeFuture = inputProcessor.getAvailableFuture();
} else if (changelogWriterAvailabilityProvider != null) {
// currently, waiting for changelog availability is reported as
busy
- // todo: add new metric (FLINK-24402)
- timer = null;
+ timer = new
GaugePeriodTimer(ioMetrics.getIoBackPressuredTimePerSecond());
Review Comment:
> When a task is waiting for changelog availability (but not blocked in
writer), it will be reported as idle.
Are you sure? With the code in this PR as it is right now, I think task
blocked on this changelog availability future, would be reported as back
pressured.
> Won't such a change without adjusting the UI be confusing to the users?
As far as I can tell, this PR will move "waiting for changelog availability"
from "busy" to "backpressured" in the UI. From this perspective, this should be
fine in the UI (UI only displays `busy/backpressured/idle` and `backpressured =
soft + hard`) % we are changing semantic pf those metric a bit.
##########
docs/content/docs/ops/metrics.md:
##########
@@ -1717,6 +1717,11 @@ Note that the metrics are only available via reporters.
<td>Maximum recorded duration of a single consecutive period of the task
being in the hard back pressure state in the last sampling period. Please check
softBackPressuredTimeMsPerSecond and hardBackPressuredTimeMsPerSecond for more
information.</td>
<td>Gauge</td>
</tr>
+ <tr>
+ <td>ioBackPressuredTimeMsPerSecond</td>
+ <td>The time (in milliseconds) this task is back pressured by the
uploading of Changelog state backend, only positive when Changelog state
backend is enabled. Please check 'dstl.dfs.upload.max-in-flight' for more
information.</td>
+ <td>Gauge</td>
Review Comment:
I think we should generalize this description a bit or change the name of
the metric to something with `changelog` name.
Either instead of saying that this metric only reports changelog time,
rename `ioBackPressuredTime` to `ioTime` (vide my other comment), describe it
as "time spent by task doing IO operations, currently only takes into account
changelog state backend."
or rename the metric to `changelogTime` or `changelogBusyTime`, and keep the
very detailed description.
?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]