My 2c for Mali/Panfrost -- For us, capturing GPU perf counters is orthogonal to rendering. It's expected (e.g. with Arm's tools) to do this from a separate process. Neither Mesa nor the DDK should require custom instrumentation for the low-level data. Fahien's gfx-pps handles this correctly for Panfrost + Perfetto as it is. So for us I don't see the value in modifying Mesa for tracing.
On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote: > (responding from correct address this time) > > On Fri, Feb 12, 2021 at 12:03 PM Mark Janes <mark.a.ja...@intel.com> wrote: > > > I've recently been using GPUVis to look at trace events. On Intel > > platforms, GPUVis incorporates ftrace events from the i915 driver, > > performance metrics from igt-gpu-tools, and userspace ftrace markers > > that I locally hack up in Mesa. > > > > GPUVis is great. I would love to see that data combined with > userspace events without any need for local hacks. Perfetto provides > on-demand trace events with lower overhead compared to ftrace, so for > example it is acceptable to have production trace instrumentation that can > be captured without dev builds. To do that with ftrace it may require a way > to enable and disable the ftrace file writes to avoid the overhead when > tracing is not in use. This is what Android does with systrace/atrace, for > example, it uses Binder to notify processes about trace sessions. Perfetto > does that in a more portable way. > > > > > > It is very easy to compile the GPUVis UI. Userspace instrumentation > > requires a single C/C++ header. You don't have to access an external > > web service to analyze trace data (a big no-no for devs working on > > preproduction hardware). > > > > Is it possible to build and run the Perfetto UI locally? > > > Yes, local UI builds are possible > <https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>. > Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that > trace data is not uploaded unless you use the 'share' feature. > > > > Can it display > > arbitrary trace events that are written to > > /sys/kernel/tracing/trace_marker ? > > > Yes, I believe it does support that via linux.ftrace data source > <https://perfetto.dev/docs/quickstart/linux-tracing>. We use that for > example to overlay CPU sched data to show what process is on each core > throughout the timeline. There are many ftrace event types > <https://github.com/google/perfetto/tree/5ff758df67da94d17734c2e70eb6738c4902953e/protos/perfetto/trace/ftrace> > in > the perfetto protos. > > > > Can it be extended to show i915 and > > i915-perf-recorder events? > > > > It can be extended to consume custom data sources. One way this is done is > via a bridge daemon, such as traced_probes which is responsible for > capturing data from ftrace and /proc during a trace session and sending it > to traced. traced is the main perfetto tracing daemon that notifies all > trace data sources to start/stop tracing and communicates with user tracing > requests via the 'perfetto' command. > > > > > > > John Bates <jba...@chromium.org> writes: > > > > > I recently opened issue 4262 > > > <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the > > > discussion on integrating perfetto into mesa. > > > > > > *Background* > > > > > > System-wide tracing is an invaluable tool for developers to find and fix > > > performance problems. The perfetto project enables a combined view of > > trace > > > data from kernel ftrace, GPU driver and various manually-instrumented > > > tracepoints throughout the application and system. This helps developers > > > quickly answer questions like: > > > > > > - How long are frames taking? > > > - What caused a particular frame drop? > > > - Is it CPU bound or GPU bound? > > > - Did a CPU core frequency drop cause something to go slower than > > usual? > > > - Is something else running that is stealing CPU or GPU time? Could I > > > fix that with better thread/context priorities? > > > - Are all CPU cores being used effectively? Do I need > > sched_setaffinity > > > to keep my thread on a big or little core? > > > - What’s the latency between CPU frame submit and GPU start? > > > > > > *What Does Mesa + Perfetto Provide?* > > > > > > Mesa is in a unique position to produce GPU trace data for several GPU > > > vendors without requiring the developer to build and install additional > > > tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>. > > > > > > The key is making it easy for developers to use. Ideally, perfetto is > > > eventually available by default in mesa so that if your system has > > perfetto > > > traced running, you just need to run perfetto (perhaps along with setting > > > an environment variable) with the mesa categories to see: > > > > > > - GPU processing timeline events. > > > - GPU counters. > > > - CPU events for potentially slow functions in mesa like shader > > compiles. > > > > > > Example of what this data might look like (with fake GPU events): > > > [image: percetto-gpu-example.png] > > > > > > *Runtime Characteristics* > > > > > > - ~500KB additional binary size. Even with using only the basic > > features > > > of perfetto, it will increase the binary size of mesa by about 500KB. > > > - Background thread. Perfetto uses a background thread for > > communication > > > with the system tracing daemon (traced) to advertise trace data and > > get > > > notification of trace start/stop. > > > - Runtime overhead when disabled is designed to be optimal with one > > > predicted branch, typically a few CPU cycles > > > <https://perfetto.dev/docs/instrumentation/track-events#performance> > > per > > > event. While enabled, the overhead can be around 1 us per event. > > > > > > *Integration Challenges* > > > > > > - The perfetto SDK is C++ and designed around macros, lambdas, inline > > > templates, etc. There are ongoing discussions on providing an official > > > perfetto C API, but it is not yet clear when this will land on the > > perfetto > > > roadmap. > > > - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K > > > lines of code. > > > - Anything that includes perfetto.h takes a long time to compile. > > > - The current Perfetto SDK design is incompatible with being a shared > > > library behind a C API. > > > > > > *Percetto* > > > > > > The percetto library <https://github.com/olvaffe/percetto> was recently > > > implemented to provide an interim C API for perfetto. It provides > > efficient > > > support for scoped trace events, multiple categories, counters, custom > > > timestamps, and debug data annotations. Percetto also provides some > > > features that are important to mesa, but not available yet with perfetto > > > SDK: > > > > > > - Trace events from multiple perfetto instances in separate shared > > > libraries (like mesa and virglrenderer) show correctly in a single > > process > > > and thread view. > > > - Counter tracks and macro API. > > > > > > Percetto is missing API for perfetto's GPU DataSource and counter > > support, > > > but that feature could be implemented next if it is important for mesa. > > > With the existing percetto API mesa could present GPU trace data as named > > > 'slice' events and int64_t counters with custom timestamps as shown in > > the > > > image above (based on this sample > > > <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>). > > > > > > *Mesa Integration Alternatives* > > > > > > Note: we have some pressing needs for performance analysis in Chrome OS, > > so > > > I'm intentionally leaving out the alternative of waiting for an official > > > perfetto C API. Of course, once that C API is available it would become > > an > > > option to migrate to it from any of the alternatives below. > > > > > > Ordered by difficulty with easiest first: > > > > > > 1. Statically link with percetto as an optional external dependency > > > (virglrenderer > > > now has this approach > > > < > > https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480> > > > ). > > > - Pros: API already supports most common tracing needs. Tested and > > used > > > by an increasing number of CrOS components. > > > - Cons: External dependency for optional mesa build option. > > > 2. Embed Perfetto SDK + a Percetto fork/copy. > > > - Pros: API already supports most common tracing needs. No added > > > external dependency for mesa. > > > - Cons: Percetto code divergence, bug fixes need to land in two > > trees. > > > 3. Embed Perfetto SDK + custom C wrapper. > > > - Pros: Tailored API for mesa's needs. > > > - Cons: Nontrivial development efforts and maintenance. > > > 4. Generate C stubs for the Perfetto protobuf and reimplement the > > > Perfetto SDK in C. > > > - Pros: Tailored API for mesa's needs. Possible smaller binary > > impact > > > from simpler implementation. > > > - Cons: Significant development efforts and maintenance. > > > > > > Regardless of the integration direction, I expect we would disable > > perfetto > > > in the default build for now to minimize disruption. > > > > > > I like #1, because there are some nontrivial subtleties to the C wrapper > > > that provide both API conveniences and runtime performance that would > > need > > > to be reimplemented or maintained with the other options. I will also > > > volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D. > > > > > > Any other thoughts on how best to integrate perfetto into mesa? > > > > > > -jb > > > _______________________________________________ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev