> On 3 Nov 2025, at 9:35 am, Kugan Vivekanandarajah <[email protected]> 
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi,
> 
> I've implemented hierarchical discriminators for AutoFDO
> This helps AutoFDO profile accuracy by:
> - Loop iterations are now uniquely identifiable in profile data
> - Distinguishes which iteration of an unrolled loop executed hotly and so on.
> 
> The discriminator in AutoFDO is is extended from 16 bits to 32 bits
> with three fields:
> 
>  - Base (12 bits): Traditional same-line disambiguation
>  - Pass1 (12 bits): Optimization context (loop versioning, inlining)
>  - Pass2 (8 bits): Code duplication (loop unrolling, peeling)
> 
> The inline context tracking (pass1 discriminators for inlining) is NOT added 
> to Thi patch , as initial testing did not show performance improvements.
> 
> We could add  hierarchical discriminator this conditionally via a compiler 
> parameter or a specific compiler options if that is preferred
> 
> Bootstrapped and regression tested. Initial testing on Spec2017 with AutoFDO 
> shows some good improvements. I am rerunning the full suite and will update 
> the results.
> 
> Is this OK?
> 
> Thanks,
> Kugan
> <0001-Implement-hierarchical-discriminators-for-AutoFDO.patch>
Hi,

I can split this patch into patch series for easier review. Before that, I have
some feedback:

1. How do we want to divide the discriminator?
LLVM's Approach:
Old way when -enable-fs-discriminator is NOT set:
┌────────-──────────┬────────────────────┬─────────────────┐
│ Base Discriminator                                   │                        
    Duplication Factor               │          Copy Identifier                 
          │
└─────────-─────────┴────────────────────┴─────────────────┘
    (bits 0-7)                                                                  
       (middle bits)                                                       
(high bits)

When -enable-fs-discriminator is  set:
Bits [0-7]:    Base discriminator (Pass0)
Bits [8-13]:   FS Pass1 discriminator (6 bits)
Bits [14-19]:  FS Pass2 discriminator (6 bits)
Bits [20-25]:  FS Pass3 discriminator (6 bits)
Bits [26-31]:  FS Pass4 discriminator (6 bits)

This is used as:
Base Discriminator (bits 0-7): This is the same as the standard base 
discriminator.
Pass1 (bits 8-13): Tracks CFG changes before/during register allocation (RA).
Pass2 (bits 14-19): Tracks changes after RA but before block placement.
Pass3 (bits 20-25): Reserved.
Pass4 (bits 26-31): Tracks changes after all major transformations.

This is usfull when we reload the profile and readjust the profile annotation.
Given that, how do we want to do? In my implementation we have:
Base (12 bits): Traditional same-line disambiguation
Pass1 (12 bits): Optimization context (loop versioning, inlining)
Pass2 (8 bits): Code duplication (loop unrolling, peeling)
Since I am aggregating counts with the same line+base in a function, we don't 
need duplication factor here.

2. How to handle when two instructions from different source lines are combined 
during optimizations?
LLVM psedo-proble can handle this.  AFIU, hierarchical discriminator can not 
handle this correctly without  additional support. However, recording this as a 
bit in hierarchical discriminator will be useful. When we propagaiet 
annotations. we know that the  profile count is double counted and should not 
be propagated. Also if we can  infer counts from CFG, we should use that?


Thanks,

Kugan

Reply via email to