hoy added inline comments.
================ Comment at: llvm/include/llvm/IR/PseudoProbe.h:41 // [18:3] - probe id - // [25:19] - reserved + // [25:19] - probe distribution factor // [28:26] - probe type, see PseudoProbeType ---------------- wmi wrote: > hoy wrote: > > wmi wrote: > > > hoy wrote: > > > > hoy wrote: > > > > > wmi wrote: > > > > > > The bits in discriminator is a scare resource. Have you considered > > > > > > using less bits to represent probe distribution factor? I guess it > > > > > > is possible that using a little more coarse grain distribution > > > > > > factor won't affect performance. > > > > > That's a good point. We are using seven bits to represent [0, 100] so > > > > > that integral numbers can be distinguished. Yes, we could use fewer > > > > > bits to represent, say 4 bits to represent only even numbers. We > > > > > could also not use any bits here but instead use the distribution > > > > > factor of the outer block probes when the competition of those bits > > > > > are high. I can do an experiment to see how well that works. > > > > On a second thought, using the distribution factor of block probes for > > > > call probe may not work well since a callsite may be surrounded by more > > > > than one block probes. > > > > > > > > We could use also fewer bits like 6 bits to encode even numbers in the > > > > range [0, 100], or 5 bits to encoding multiples of 3 in [0, 100]. I did > > > > a profile quality measurement with the even number encoding. It's OK > > > > overall except for two SPEC benchmarks. I guess it's a trade-off we'll > > > > have to take when there's a competition on those bits. > > > Could you elaborate a little bit about the case that a callsite is > > > surrounded by more than one block probe? Is it because bb merge like in > > > cfg simplification? > > Yes, block merge in cfg simplification is a good example. Inlining can also > > end up with callee code and caller code in one block. Jump threading or > > other cfg optimizations that convert a conditional jump into an > > unconditional jump can result in block merge too. > > > > So far our way to track block weight for blocks with multiple probes is to > > take the maximum count out of those probes. When it comes to tracking > > callsite count, it is handy and accurate to attach a dedicated distribution > > factor for each individual call. For example, when a call is inlined, the > > inlinee's probes will be cloned into the caller, and they will be prorated > > based on the callsite's dedicated distribution factor. > Actually, I think we may be able to extend Discriminator and > PseudoProbeDwarfDiscriminator. To emit Discriminator into Dwarf, we need to > follow Dwarf standard about how many bits Discrminator is going to occupy. > But inside compiler, Discriminator is represented as MetaData so it hasn't to > be 32bits. For example, we can extend Discriminator MetaData to be 64bits or > even larger and specify only lower 32bits will be actually emitted into Dwarf > section. For intermediate information like distribution factors, we can put > it into the higher bits. That's a good idea, I like that. Actually we thought about that int the past and our concern was about memory cost since the discriminator filed in `DILexicalBlockFile` metadata is not optional. It is probably OK for pseudo probe since discriminators are only used for callsites. It might be a problem with -fdebug-info-for-profiling where discriminators can be used more often. It sounds to me extending the size of discriminator is desirable for pseudo probes and potentially FS-AFDO. It might be worth evaluating the cost at some time. What do you think? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93264/new/ https://reviews.llvm.org/D93264 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits