[ 
https://issues.apache.org/jira/browse/FLINK-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711681#comment-15711681
 ] 

Ufuk Celebi commented on FLINK-3160:
------------------------------------

OK, there was some confusion on my side, because I had a large test job running 
with 1 slot per task manager, which means that both the sub tasks and task 
managers tabs are the same. This led me to think that something is wrong. It 
definitely makes sense to have stats aggregated by TM in the TM tab.

Also, the per sub task listings definitely stop being useful after a certain 
parallelism. The thing is that the checkpoint sub task statistics are ordered 
per sub task index (with listing of the index). If you now notice something for 
a single sub task, there is no easy way of finding out on which TM this sub 
task runs on, how many records this sub task processed etc. Listing the sub 
task index for the Sub Tasks tab would here, but that's definitely a story for 
a different issue. ;-)

> Aggregate operator statistics by TaskManager
> --------------------------------------------
>
>                 Key: FLINK-3160
>                 URL: https://issues.apache.org/jira/browse/FLINK-3160
>             Project: Flink
>          Issue Type: Improvement
>          Components: Webfrontend
>    Affects Versions: 1.0.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>             Fix For: 1.0.0
>
>
> The web client job info page presents a table of the following per task 
> statistics: start time, end time, duration, bytes received, records received, 
> bytes sent, records sent, attempt, host, status.
> Flink supports clusters with thousands of slots and a job setting a high 
> parallelism renders this job info page unwieldy and difficult to analyze in 
> real-time.
> It would be helpful to optionally or automatically aggregate statistics by 
> TaskManager. These rows could then be expanded to reveal the current per task 
> statistics.
> Start time, end time, duration, and attempt are not applicable to a 
> TaskManager since new tasks for repeated attempts may be started. Bytes 
> received, records received, bytes sent, and records sent are summed. Any 
> throughput metrics can be averaged over the total task time or time window. 
> Status could reference the number of running tasks on the TaskManager or an 
> idle state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to