hoy added inline comments.

================
Comment at: llvm/include/llvm/IR/PseudoProbe.h:41
   //  [18:3] - probe id
-  //  [25:19] - reserved
+  //  [25:19] - probe distribution factor
   //  [28:26] - probe type, see PseudoProbeType
----------------
wmi wrote:
> hoy wrote:
> > wmi wrote:
> > > hoy wrote:
> > > > hoy wrote:
> > > > > wmi wrote:
> > > > > > The bits in discriminator is a scare resource. Have you considered 
> > > > > > using less bits to represent probe distribution factor? I guess it 
> > > > > > is possible that using a little more coarse grain distribution 
> > > > > > factor won't affect performance.
> > > > > That's a good point. We are using seven bits to represent [0, 100] so 
> > > > > that integral numbers can be distinguished. Yes, we could use fewer 
> > > > > bits to represent, say 4 bits to represent only even numbers. We 
> > > > > could also not use any bits here but instead use the distribution 
> > > > > factor of the outer block probes when the competition of those bits 
> > > > > are high. I can do an experiment to see how well that works.
> > > > On a second thought, using the distribution factor of block probes for 
> > > > call probe may not work well since a callsite may be surrounded by more 
> > > > than one block probes. 
> > > > 
> > > > We could use also fewer bits like 6 bits to encode even numbers in the 
> > > > range [0, 100], or 5 bits to encoding multiples of 3 in [0, 100]. I did 
> > > > a profile quality measurement with the even number encoding. It's OK 
> > > > overall except for two SPEC benchmarks. I guess it's a trade-off we'll 
> > > > have to take when there's a competition on those bits. 
> > > Could you elaborate a little bit about the case that a callsite is 
> > > surrounded by more than one block probe? Is it because bb merge like in 
> > > cfg simplification?
> > Yes, block merge in cfg simplification is a good example. Inlining can also 
> > end up with callee code and caller code in one block. Jump threading or 
> > other cfg optimizations that convert a conditional jump into an 
> > unconditional jump can result in block merge too.
> > 
> > So far our way to track block weight for blocks with multiple probes is to 
> > take the maximum count out of those probes. When it comes to tracking 
> > callsite count, it is handy and accurate to attach a dedicated distribution 
> > factor for each individual call. For example, when a call is inlined, the 
> > inlinee's probes will be cloned into the caller, and they will be prorated 
> > based on the callsite's dedicated distribution factor.
> Actually, I think we may be able to extend Discriminator and 
> PseudoProbeDwarfDiscriminator. To emit Discriminator into Dwarf, we need to 
> follow Dwarf standard about how many bits Discrminator is going to occupy. 
> But inside compiler, Discriminator is represented as MetaData so it hasn't to 
> be 32bits. For example, we can extend Discriminator MetaData to be 64bits or 
> even larger and specify only lower 32bits will be actually emitted into Dwarf 
> section. For intermediate information like distribution factors, we can put 
> it into the higher bits. 
That's a good idea, I like that. Actually we thought about that int the past 
and our concern was about memory cost since the discriminator filed in 
`DILexicalBlockFile` metadata is not optional. It is probably OK for pseudo 
probe since discriminators are only used for callsites. It might be a problem 
with -fdebug-info-for-profiling where discriminators can be used more often. 

It sounds to me extending the size of discriminator is desirable for pseudo 
probes and potentially FS-AFDO. It might be worth evaluating the cost at some 
time. What do you think?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93264/new/

https://reviews.llvm.org/D93264

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to