david-xl wrote:

> And digging even deeper:
> 
> * FWIW I noticed that I only used `clang -c` as benchmark previously, should 
> have used `clang -c -O3` resulting in this:
> 
> ```
> Old-BFI:          insn: 37,821,687,947  (baseline)
> New-BFI:          insn: 38,133,312,923  +0.82%
> Old-BFI, no-cold: insn: 37,423,365,184  -1.05%
> New-BFI, no-cold: insn: 37,438,736,713  -1.01%
> ```
> 
> * The problematic part of the code that is inlined/not-inlined is actually 
> marked as `LLVM_UNLIKELY` here: 
> https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/DenseMap.h#L607
>   and intuitively one would think that it is probably best to not inline 
> `grow`  (as will happen with the new BFI version)
> * In practice not-inlining `grow` mostly ends up being worse because of 
> knock-on effects as far as I can tell. These are the inlining decisions I 
> noticed for the most prominent situation:
> 
> ```
>     - `grow` inlined (Old-BFI):
>          - Instruction::getMetadataImpl
>              -> Value::getMetadata           not inlined
>              -> DenseMap::operator[]         inlined
>              -> DenseMap::FindAndConstruct   inlined
>              -> DenseMap::InsertIntoBucket   not inlined, size
>                    likely too big with `grow` inlined here
>              -> DenseMap::grow               inlined
>     - `grow` inlined (New-BFI):
>          - Instruction::getMadataImpl
>              -> Value::getMetadata          inlined
>              -> DenseMap::operator[]        inlined
>              -> DenseMap::FindAndConstruct  not inlined, size
>              -> DenseMap::InsertIntoBucket  inlined
>              -> DenseMap::grow              not inlined
> ```
> 
> Though if you look into this then I would state that the code got better for 
> the wrong reasons! Not inlining `grow` is a sensible decision in this context 
> and the `LLVM_UNLIKELY` annotation makes sense (I actually added some 
> counters and see the unlikely branch taken somewhere between 1% and 10% of 
> the cases depending on inputs, so seems sensible).
> 
> Unfortunately the particular code here `getMetadataImpl` never inserts new 
> things into the map, but unfortunately `operator[]` gives you that behavior 
> by default so nobody noticed. So not inlining `InsertIntoBucket` happened to 
> be a great decision previously that the compiler did by accident without 
> having good data to support this. Now with better but still insufficient data 
> (as this is PGO) we happen to end up inlining `InsertIntoBucket` wasting code 
> size leading to worse inlining decisions further up the stack...
> 
> Long story short: This ended up another of the many stories where the 
> compiler cannot make the right inlining decisions without having actual 
> runtime data. It tends to make a better decision based on the data that ends 
> up being wrong anyway here. I'm gonna leave things as is and rather put up 
> some improvements to LLVM code instead!

Good analysis.

Your plan sounds good but lowering the cold threshold is probably a good thing 
to do with the new BFI. The FindAndConstruct method in DenseMap can probably be 
annotated with buitin_expect to indicate the insertion is a unlikely path.

https://github.com/llvm/llvm-project/pull/66285
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to