Hi Sebastian! There is some profiling code that was used by previous versions of Flink (Stratosphere). The profiling works, but there is currently nothing that displays the profiling data.
It would be a great addition to integrate displaying the profiling code in the web frontend, or making it available for download. Have a look at those classes here: - JobManager side : https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/profiling/impl/JobManagerProfilerImpl.java - TaskManager sied : https://github.com/apache/incubator-flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/profiling/impl/TaskManagerProfilerImpl.java Daniel Warneke authored those, maybe he can chime in and give a few pointers Greetings, Stephan On Tue, Aug 19, 2014 at 11:08 AM, Kruse, Sebastian <[email protected]> wrote: > Hi everyone, > > I want to profile my flink jobs to find bottlenecks. I read the issue > https://issues.apache.org/jira/browse/FLINK-964 and my question is > whether there are currently ongoing efforts to bring the profiling data to > the web frontend. > > Additionally, I was thinking of some kind of logical profiling, that > measures the elements (like tuples) being passed among the operators. That > way one could better understand the properties of intermediate data, e.g., > join cardinalities. Plotting these data against a time axis, one would come > up with something like a data flow profile of the job. However, before > engaging in creating such profiles, I wanted to ask you if the system > already keeps track of such data. For instance, the job history graphs > provide something similar, but the scheduling states of tasks are not > necessarily identical to the data flow through them. > I am happy for any comments! > > Cheers, > Sebastian >
