Hi,

On 24.6.2019 19.36, Elie Tournier wrote:
Great topic. For the past few days, I was looking at a CI for Mesa:
https://gitlab.freedesktop.org/hopetech/tracie
OK, it's in a very very alpha stage. ;)

My idea was to use apitrace to dump and replay traces then compare images with 
reference
images or images dump the previous week.
Apitrace was a good choice for a "correctness CI", maybe not for the "performance 
CI".

@eric Out of curiosity, did you looked at apitrace or did you go straight with 
renderdoc?

Note: ezBench supports both Apitrace & vktrace.


I add below some comments based on what I learned playing with the CI.


On Sat, Jun 22, 2019 at 10:59:34AM -0700, Rob Clark wrote:
On Thu, Jun 20, 2019 at 12:26 PM Eric Anholt <e...@anholt.net> wrote:

Hey folks, I wanted to show you this follow-on to shader-db I've been
working on:

https://gitlab.freedesktop.org/anholt/renderdoc-traces

"On each frame drawn, renderdoccmd replay sets up the initial GL state again. This will include compiling programs."

Ouch.  This makes it pretty much useless for performance testing.


For x86 development I've got a collection of ad-hoc scripts to capture
FPS numbers from various moderately interesting open source apps so I
could compare-perf them.  I was only looking at specific apps when they
seemed relevant, so it would be easy to miss regressions.

Starting work on freedreno, one of the first questions I ran into was
"does this change to the command stream make the driver faster?".  I
don't have my old set of apps on my debian ARM systems, and even less so
for Chrome OS.  Ultimately, users will be judging us based on web
browser and android app performance, not whatever I've got laying around
on my debian system.  And, I'd love to fix that "I ignore apps unless I
think of them" thing.

So, I've used renderdoc to capture some traces from Android apps.  With
an unlocked phone, it's pretty easy.  Tossing those in a repo (not
shared here), I can then run driver changes past them to see what
happens.  See
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1134 for some
results.

Where is this repo going from here?

- I add a runner for doing frame-to-frame consistency tests.  We could
   catch UB in a lot of circumstances by replaying a few times and making
   sure that results are consistent.  Comparing frames between drivers
   might also be interesting, though for that you would need human
   validation since pixel values and pixels lit will change on many
   shader optimization changes.
Comparing frames between drivers is hard. I try comparing LLVMpipe, sotfpipe 
and i965.
They all produce different frames.
For the human validation, it's sadly hard to avoid. One of the idea Erik come 
of with
was to use a mask.

Statistical approach could be better (like error is handled in video
compression).


I think we should first focus on comparing frame from the same driver and 
extend later.
The subject is hard enough. ;)

Note that there are some benchmarks which don't produce stable rendering
results because they include random input, so you can't do automated
rendering differences detection for them.  I would suggest just dropping
those (I don't anymore remember which benchmarks were such, but Martin
Peres might).


- Need to collect more workloads for the public repo:
I would be happy to help here.
We should create a list of FOSS games/apps to dump based on there OGL 
requirement.

   - I've tried to capture webgl on Chrome and Firefox on Linux with no
     luck. WebGL on ff is supposed to work under apitrace, maybe I could
     do that and then replay on top of renderdoc to capture.

perhaps worth a try capturing these on android?

I have managed to apitrace chromium-browser in the past.. it ends up a
bit weird because there are multiple contexts, but apitrace has
managed to replay them.  Maybe the multiple ctx thing is confusing
renderdoc?

(tbh I've not really played w/ renderdoc yet.. I should probably do so..)

   - Mozilla folks tell me that firefox's WebRender display lists can be
     captured in browser and then replayed from the WR repo under
     apitrace or rendredoc.

   - I tried capturing Mozilla's new Pathfinder (think SVG renderer), but
     it wouldn't play the demo under renderdoc.

   Do you have some apps that should be represented here?

- Add microbenchmarks?  Looks like it would be pretty easy to grab
   piglit drawoverhead results, not using renderdoc.  Capturing from
   arbitrary apps expands the scope of the repo in a way I'm not sure I'm
   excited about (Do we do different configs in those apps?  Then we need
   config infrastructure.  Ugh).

- I should probably add an estimate of "does this overall improve or
   hurt perf?"  Yay doing more stats.

Good way to measure perf could be repeating specific frame in a trace
(when that doesn't include re-compiling the shaders).

If I remember correctly, that's already supported by vktrace and
(apitrace based) frameretrace.


Sure. Sadly most benchmark I tryed were unstable performancewise.
Cache change result a lot. Well, you already know it.
If shader cache changes things, shaders are compiled during
benchmarking, which means it's a bad benchmark.  Shader compilation
should be benchmarked separately.

Or if you were meaning CPU caches...  Completely unrelated changes can
impact CPU speed because code gets aligned slightly differently in
memory which affects cache access patterns.  I.e. some performance
change can be completely accidental and disappear with another
completely unrelated code change.

Note also that I've found that in memory bandwidth bound test-cases
there can be ~10% variation on Intel based on how memory mappings happen
to get aligned (which can change from between boots even more than
between individual process run, or just LD_PRELOADing library that isn't
even used).

Because of latter, one sees real performance changes better by running
tests with different commits (i.e. continuous commit perf trend), than
just doing repeats with a single build.


- I'd love to drop scipy.  I only need it for stats.t.ppf, but it
   prevents me from running run.py directly on my targets.

How much you need PPF?  Maybe you could use some simpler statistics
(e.g. from python3 builtin statistics module) if scipy import fails?


        - Eero

thoughts about adding amd_perfcntr/etc support?  I guess some of the
perfcntrs we have perhaps want some post-processing to turn into
usuable numbers, and plenty of them we don't know much about what they
are other than the name.  But some of them are easy enough to
understand (like # of fs ALU cycles, etc), and being able to compare
that before/after shader optimizations seems useful.

Also, it would be nice to have a way to extract "slow frames" somehow
(maybe out of scope for this tool, but related?).. ie. when framerate
suddenly drops, those are the frames we probably want to look at more
closely..
+1

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to