[
https://issues.apache.org/jira/browse/FLINK-34025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804364#comment-17804364
]
Emre Kartoglu commented on FLINK-34025:
---------------------------------------
A potential metric definition for data skew percentage for a given operator
_(maxRecordsReceivedBySubtask - minRecordsReceivedBySubtask) /
totalRecordsReceivedByAllSubtasks_
> Show data skew score on Flink Dashboard
> ---------------------------------------
>
> Key: FLINK-34025
> URL: https://issues.apache.org/jira/browse/FLINK-34025
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Web Frontend
> Affects Versions: 1.19.0
> Reporter: Emre Kartoglu
> Priority: Major
> Labels: dashboard
> Attachments: skew_proposal.png, skew_tab.png
>
>
> *Problem:* Currently users have to click on every operator and check how much
> data each subtask is processing to see if there is data skew. This is
> particularly cumbersome and error-prone for jobs with big job graphs. Data
> skew is an important metric that should be more visible.
>
> *Proposed solution:*
> * Show a data skew score on each operator (see screenshot below). This would
> be an improvement, but would not be sufficient. As it would still not be easy
> to see the data skew score for jobs with very large job graphs (it'd require
> a lot of zooming in/out).
> * Show data skew score for each operator under a new "Data Skew" tab next to
> the Exceptions tab. See screenshot below
> !skew_tab.png|width=1226,height=719! .
>
> !skew_proposal.png|width=845,height=253!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)