[
https://issues.apache.org/jira/browse/FLINK-30185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639816#comment-17639816
]
Rui Fan commented on FLINK-30185:
---------------------------------
Hi [~xtsong] , thanks for your reply.
The improvement mainly includes 2 parts:
# How the web frontend show the flame_graph for single subtask?
# How the backend save or fetch the thread info sample for single subtask?
h2. Web Frontend
It's similar with Metrics, we need to add a select box that select subtaskIndex
all or one subtaskIndex.
And pass the subtaskIndex to backend.
!image-2022-11-28-14-48-20-462.png!
!image-2022-11-28-14-38-47-145.png|width=783,height=286!
h2. Backend
h3. 1. Refactor the cache logic
Currently, the cache key of ThreadInfo is jobId + JobVertexId. The cache key
should be changed to jobId + jobVertexId + subtaskIndex.
h3. 2. Add the subtaskIndex
Allow request threadInfo from single subtask.
If anything is wrong or missed, please let me know, thanks!
> Provide the flame graph to the subtask level
> --------------------------------------------
>
> Key: FLINK-30185
> URL: https://issues.apache.org/jira/browse/FLINK-30185
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / REST, Runtime / Web Frontend
> Reporter: Rui Fan
> Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2022-11-24-14-49-42-845.png,
> image-2022-11-28-14-38-47-145.png, image-2022-11-28-14-48-20-462.png
>
>
> FLINK-13550 supported for CPU FlameGraphs in web UI.
> As Flink doc mentioned:
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/flame_graphs/#sampling-process
> {code:java}
> Note: Stack trace samples from all threads of an operator are combined
> together. If a method call consumes 100% of the resources in one of the
> parallel tasks but none in the others, the bottleneck might be obscured by
> being averaged out.
> There are plans to address this limitation in the future by providing “drill
> down” visualizations to the task level. {code}
>
> The flame graph at the subtask level is very useful when a small number of
> subtasks are bottlenecked. So we should provide the flame graph to the
> subtask level
>
> !image-2022-11-24-14-49-42-845.png!
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)