I'd second to Alex's concerns. Is there a reason why you can't use the async-profiler directly? In what kind of environment are your Flink clusters running (YARN / k8s / ...)?
Best, D. On Wed, Jan 26, 2022 at 4:32 PM Alexander Fedulov <alexan...@ververica.com> wrote: > Hi Jacky, > > Could you please clarify what kind of *problems* you experience with the > large parallelism? You referred to D3, is it something related to rendering > on the browser side or is it about the samples collection process? Were you > able to identify the bottleneck? > > Fundamentally I have some concerns regarding the proposed approach: > 1. Calling shell scripts triggered via the web UI is a security concern and > it needs to be evaluated carefully if it could introduce any unexpected > attack vectors (depending on the implementation, passed parameters etc.) > 2. My understanding is that the async-profiler implementation is > system-dependent. How do you propose to handle multiple architectures? > Would you like to ship each available implementation within Flink? [1] > 3. Do you plan to make use of full async-profiler features including native > calls sampling with perf_events? If so, the issue I see is that some > environments restrict ptrace calls by default [2] > > [1] https://github.com/jvm-profiling-tools/async-profiler#download > [2] > > https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces > > > Best, > Alexander Fedulov > > On Wed, Jan 26, 2022 at 1:59 PM 李森 <lisen...@icloud.com.invalid> wrote: > > > This is an expected feature, as we also experienced browser crashes on > > existing operator-level flame graphs > > > > Best, > > Echo Lee > > > > > 在 2022年1月24日,下午6:16,David Morávek <david.mora...@gmail.com> 写道: > > > > > > Hi Jacky, > > > > > > The link seems to be broken, here is the correct one [1]. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-213%3A+TaskManager%27s+Flame+Graphs > > > > > > Best, > > > D. > > > > > >> On Mon, Jan 24, 2022 at 9:48 AM Jacky Lau <281293...@qq.com.invalid> > > wrote: > > >> > > >> Hi All, > > >> I would like to start the discussion on FLIP-213 < > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-213%3A+TaskManager%27s+Flame+Graphs> > > >> ; > > >> which aims to provide taskmanager level(process level) flame > graph > > >> by async profiler, which is most popular tool in java performance. and > > the > > >> arthas and intellij both use it. > > >> And we support it in our ant group company. > > >> And Flink supports FLIP-165: Operator's Flame Graphs > > >> now. and it draw flame graph by the front-end > > >> libraries d3-flame-graph, which has some problem in jobs > > >> of large of parallelism. > > >> Please be aware that the FLIP wiki area is not fully done > > >> since i don't konw whether it will accept by > flink community. > > >> Feel free to add your thoughts to make this feature > > better! i > > >> am looking forward to all your response. Thanks too much! > > >> > > >> > > >> > > >> > > >> Best Jacky Lau > > >