Our flink application is on k8s.Yes, user can use the async-profiler directly, but it is not convenient for user, who should download the jars and need to know how to use it. And some users don’t know the tool.if we integrate it, user will benefit a lot.
On 2022/01/26 18:56:17 David Morávek wrote: > I'd second to Alex's concerns. Is there a reason why you can't use the > async-profiler directly? In what kind of environment are your Flink > clusters running (YARN / k8s / ...)? > > Best, > D. > > On Wed, Jan 26, 2022 at 4:32 PM Alexander Fedulov <al...@ververica.com> > wrote: > > > Hi Jacky, > > > > Could you please clarify what kind of *problems* you experience with the > > large parallelism? You referred to D3, is it something related to rendering > > on the browser side or is it about the samples collection process? Were you > > able to identify the bottleneck? > > > > Fundamentally I have some concerns regarding the proposed approach: > > 1. Calling shell scripts triggered via the web UI is a security concern and > > it needs to be evaluated carefully if it could introduce any unexpected > > attack vectors (depending on the implementation, passed parameters etc.) > > 2. My understanding is that the async-profiler implementation is > > system-dependent. How do you propose to handle multiple architectures? > > Would you like to ship each available implementation within Flink? [1] > > 3. Do you plan to make use of full async-profiler features including native > > calls sampling with perf_events? If so, the issue I see is that some > > environments restrict ptrace calls by default [2] > > > > [1] https://github.com/jvm-profiling-tools/async-profiler#download > > [2] > > > > https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces > > > > > > Best, > > Alexander Fedulov > > > > On Wed, Jan 26, 2022 at 1:59 PM 李森 <li...@icloud.com.invalid> wrote: > > > > > This is an expected feature, as we also experienced browser crashes on > > > existing operator-level flame graphs > > > > > > Best, > > > Echo Lee > > > > > > > 在 2022年1月24日,下午6:16,David Morávek <da...@gmail.com> 写道: > > > > > > > > Hi Jacky, > > > > > > > > The link seems to be broken, here is the correct one [1]. > > > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-213%3A+TaskManager%27s+Flame+Graphs > > > > > > > > Best, > > > > D. > > > > > > > >> On Mon, Jan 24, 2022 at 9:48 AM Jacky Lau <28...@qq.com.invalid> > > > wrote: > > > >> > > > >> Hi All, > > > >> I would like to start the discussion on FLIP-213 < > > > >> > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-213%3A+TaskManager%27s+Flame+Graphs> > > > >> ; > > > >> which aims to provide taskmanager level(process level) flame > > graph > > > >> by async profiler, which is most popular tool in java performance. and > > > the > > > >> arthas and intellij both use it. > > > >> And we support it in our ant group company. > > > >> And Flink supports FLIP-165: Operator's Flame Graphs > > > >> now. and it draw flame graph by the front-end > > > >> libraries d3-flame-graph, which has some problem in jobs > > > >> of large of parallelism. > > > >> Please be aware that the FLIP wiki area is not fully done > > > >> since i don't konw whether it will accept by > > flink community. > > > >> Feel free to add your thoughts to make this feature > > > better! i > > > >> am looking forward to all your response. Thanks too much! > > > >> > > > >> > > > >> > > > >> > > > >> Best Jacky Lau > > > > > >