It's great that we have PGO support in go now, and it's relatively easy to 
use.  One thing that is of mild concern to me is that typically one will be 
checking the default.pgo file in to the repo, and git is notoriously not 
great at handling binary blobs that are updated  frequently.  Its "diff 
compression" doesn't really work on compressed data, of course, which means 
that in a standard (non-shallow) clone that pulls down all history, over 
time it can contribute significantly to time to clone and size of the 
`.git` directory.  git lfs avoids the cumulative burden over time, but has 
its own set of problems.  At a few hundred kB each, they're not so 
concerning, but if you had for example an automated system that collected 
data from production and updated the profile data every few days (which you 
want to do if you're deploying new code that often), possibly for multiple 
executables, it starts to add up.  So far I've been recompressing the pprof 
data (which is automatically gzip-compressed) with zopfli, but I was 
thinking about making a tool to do better than that.

Firstly, we can drop entries from the mappings table if they aren't 
referenced by any locations in samples.  For a linux cgo binary this will 
include stuff like libc.  We can then also drop corresponding entries from 
the string table.

We can drop values for sample_types which are not of interest 
<https://cs.opensource.google/go/go/+/master:src/cmd/compile/internal/pgo/irgraph.go;l=162-163>.
  
That is, the first sample_type that is "samples"/"count" or 
"cpu"/"nanoseconds".  Most profiles seem to have both, but we only need to 
keep one of them.

Next, it seems like PGO ignores all but the last two stack frames 
<https://cs.opensource.google/go/go/+/master:src/cmd/compile/internal/pgo/internal/graph/graph.go;l=268-270>.
  
This is of course an implementation detail subject to change, but the logic 
for why it does this is sound, and so it's probably still safe to truncate 
stack frames at least somewhat.  Doing so would likely permit many samples 
to be merged, which could significantly reduce the uncompressed size of the 
profile.

A pprof profile is, for purposes of PGO, effectively table of execution 
stacks and how often they were sampled.  If you want to get really good 
profiling data, you do as the PGO guide tells you 
<https://go.dev/doc/pgo#notes> and collect multiple samples and merge them, 
which gets you more coverage, but also makes for larger, more varied sample 
counts, which decreases the effectiveness of compression.  For purposes of 
PGO, we only care about the relative frequency of different code paths at a 
pretty coarse granularity.  There's two opportunities here.

Normalizing and quantizing the sample counts should be possible to do with 
no significant effect on the accuracy or usefulness to PGO, and would 
improve the effectiveness of compression.  That is, you could for example 
round each sample count to the nearest power of N, and then scale them all 
so that the smallest sample count is N (where N is e.g. 2).  The effect of 
this would likely be minor, since most of the space in the profile is taken 
up by other things like the location and function tables, but it wouldn't 
hurt.

The other, much more complicated thing we can do is merge sampled 
locations.  PGO is using the profile data to improve its guesses about 
which branches are taken (including implicit branches by type for interface 
methods).  We generally don't actually care which specific statement within 
each branch is taking up the most time.  If there are no possible branches 
between two sampled locations, from PGO's perspective one might as well 
merge them (e.g. just drop the one with the lower sample count).  This is 
more complicated to do than quantization, of course, as it requires control 
flow analysis.

My questions for anyone who's read this far are

   1. Would these ideas work, or am I making bad assumptions about what PGO 
   actually needs?
   2. Are there pre-existing tools for doing this kind of thing that I just 
   haven't noticed?
   3. Are there other significant opportunities for pruning the pprof data 
   in ways that wouldn't impact PGO?
   4. Would this be valuable enough to try to roll it into an option for 
   `go tool pprof -proto`?
   5. Any pointers for pre-existing tools for doing the control-flow 
   analysis bits?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/235fbddf-c31a-4267-b0eb-26b42662a730n%40googlegroups.com.

Reply via email to