zhijiangW commented on a change in pull request #10083:
[FLINK-14472][runtime]Implement back-pressure monitor with non-blocking outputs.
URL: https://github.com/apache/flink/pull/10083#discussion_r346110805
##########
File path: docs/monitoring/back_pressure.md
##########
@@ -34,30 +34,28 @@ If you see a **back pressure warning** (e.g. `High`) for a
task, this means that
Take a simple `Source -> Sink` job as an example. If you see a warning for
`Source`, this means that `Sink` is consuming data slower than `Source` is
producing. `Sink` is back pressuring the upstream operator `Source`.
-## Sampling Threads
+## Sampling Tasks
-Back pressure monitoring works by repeatedly taking stack trace samples of
your running tasks. The JobManager triggers repeated calls to
`Thread.getStackTrace()` for the tasks of your job.
+Back pressure monitoring works by repeatedly taking samples of your running
tasks. The JobManager triggers repeated calls to `Task.isBackPressured()` for
the tasks of your job.
<img src="{{ site.baseurl }}/fig/back_pressure_sampling.png"
class="img-responsive">
-<!--
https://docs.google.com/drawings/d/1_YDYGdUwGUck5zeLxJ5Z5jqhpMzqRz70JxKnrrJUltA/edit?usp=sharing
-->
+<!--
https://docs.google.com/drawings/d/1O5Az3Qq4fgvnISXuSf-MqBlsLDpPolNB7EQG7A3dcTk/edit?usp=sharing
-->
-If the samples show that a task Thread is stuck in a certain internal method
call (requesting buffers from the network stack), this indicates that there is
back pressure for the task.
-
-By default, the job manager triggers 100 stack traces every 50ms for each task
in order to determine back pressure. The ratio you see in the web interface
tells you how many of these stack traces were stuck in the internal method
call, e.g. `0.01` indicates that only 1 in 100 was stuck in that method.
+By default, the job manager triggers 100 samples every 50ms for each task in
order to determine back pressure. The ratio you see in the web interface tells
you how many of these sample were indicating back pressure, e.g. `0.01`
indicates that only 1 in 100 was back pressured.
- **OK**: 0 <= Ratio <= 0.10
- **LOW**: 0.10 < Ratio <= 0.5
- **HIGH**: 0.5 < Ratio <= 1
-In order to not overload the task managers with stack trace samples, the web
interface refreshes samples only after 60 seconds.
+In order to not overload the task managers with samples, the web interface
refreshes samples only after 60 seconds.
Review comment:
with back pressure samples
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services