[GitHub] [flink] zhuzhurk commented on a diff in pull request #21898: [FLINK-30904][docs] Update the documentation and configuration description of slow task detector.

via GitHub Wed, 08 Feb 2023 22:22:26 -0800


zhuzhurk commented on code in PR #21898:
URL: https://github.com/apache/flink/pull/21898#discussion_r1101022277



##########
docs/content.zh/docs/deployment/speculative_execution.md:
##########
@@ -55,7 +55,12 @@ under the License.
 - [`execution.batch.speculative.max-concurrent-executions`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-speculative-speculative-max-concurrent-e)
 - [`execution.batch.speculative.block-slow-node-duration`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-speculative-speculative-block-slow-node)
 
-你还可以调优下列慢任务检测器的配置项：
+目前，预测执行通过基于执行时间的慢任务检测器来检测慢任务，检测器将定期统计所有已执行完成的节点，当完成率达到某个阈值后

Review Comment:
   》某个阈值
   
   Maybe explain it by add a link to the config item 
`slow-task-detector.execution-time.baseline-ratio`



##########
docs/content.zh/docs/deployment/speculative_execution.md:
##########
@@ -55,7 +55,12 @@ under the License.
 - [`execution.batch.speculative.max-concurrent-executions`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-speculative-speculative-max-concurrent-e)
 - [`execution.batch.speculative.block-slow-node-duration`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-speculative-speculative-block-slow-node)
 
-你还可以调优下列慢任务检测器的配置项：
+目前，预测执行通过基于执行时间的慢任务检测器来检测慢任务，检测器将定期统计所有已执行完成的节点，当完成率达到某个阈值后
+则会将已完成节点的执行时间中位数作为基线，若运行中节点的执行时间超过基线则会被判定为慢节点。值得一提的是，

Review Comment:
   》则会将已完成节点的执行时间中位数作为基线
   
   The baseline is the median multiplied by the configured 
multiplier(`slow-task-detector.execution-time.baseline-multiplier`)



##########
docs/content/docs/deployment/speculative_execution.md:
##########
@@ -62,6 +62,14 @@ To make speculative execution work better for different 
jobs, you can tune below
 - [`execution.batch.speculative.max-concurrent-executions`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-speculative-speculative-max-concurrent-e)
 - [`execution.batch.speculative.block-slow-node-duration`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-speculative-speculative-block-slow-node)
 
+Currently, speculative execution uses the slow task detector based on 
execution time to detect slow tasks. 
+The detector will periodically count all finished executions, if the finished 
execution ratio reaches the threshold, 
+the median of the tasks' execution time will be defined as the baseline and 
the execution will 
+be detected as a slow task if its execution time exceeds the baseline. It is 
worth mentioning that 
+the execution time will be weighted with the input bytes of the execution 
vertex, so the executions 
+with large data volume differences but close computing power will not be 
detected as a slow task, 
+when data skew occurs. That will avoid starting invalid attempts.

Review Comment:
   That will avoid starting invalid attempts. -> This helps to avoid starting 
unnecessary speculative attempts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] zhuzhurk commented on a diff in pull request #21898: [FLINK-30904][docs] Update the documentation and configuration description of slow task detector.

Reply via email to