Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

Jan Hubicka Tue, 24 Jun 2025 00:26:24 -0700

> > 
> > With part suffixes we also may want to merge specially, since the
> > entry_count of the split part does not correspond to entry_count of the
> > original function.
> > 
> > I wonder, does partitioned function work with the google tool?  I
> > remember it had limitations in this respect.
> > 
> 
> Yes, Here are some examples.
> 
> _Z17expand_assignmentP9tree_nodeS0_b.part.0 total:7045 head:297
>   0: 297
>   20: 297
> 
> _Z17expand_assignmentP9tree_nodeS0_b total:1488 head:277
>   1: 277
>   9: 277
> Here, we should keep the head as it is as head is for offset 1.


I actually had in ming the .cold partition
(-freprder-blocks-and-partitoin)
but this is interesting too.  We should track if we stripped .part
suffix and in that case do not merge in head counts.

However # of invocations of
_Z17expand_assignmentP9tree_nodeS0_b.part.0
should always be strictly lower than #of invocations of
_Z17expand_assignmentP9tree_nodeS0_b 

this is not reflected by head count, since I suppose the second is
inlined into some contexts which makes the execution to be accounted
spearately into their inline instances.

So merging the profiles will also lead to inconsistencies making the
.part variant to seem more hot than it is...

> 
> 
> _Z19recompute_dominator13cdi_directionP15basic_block_def.part.0 total:1182 
> head:13
>   0: 13
>   3: 13
>   11: 13
> 
>  _Z19recompute_dominator13cdi_directionP15basic_block_def total:11 head:9
>   1: 0
>   3: 0
>   9: 9
> 
> Here also, we should keep the head as it is as head is for offset 9. 
> 
> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node.part.0
>  total:85 head:5
>   0: 8
>   11: 0
>   12: 0
>   16: 0
>   17: 0
>   18: 0
>   20: 0
>   21: 0
>   25: 0
>   25.1: 2
>   27: 2
>   30: 0
>   31: 0
>   34: 0
>   35: 2
>   38: 2
>   38.1: 2
>   39: 2
>   41: 2
>   46: 2
>   52.1: 0
>   54: 0
>   54.1: 0
>   56: 8
>   57: 0
>   59: 0
>   62: 0
>   63: 3
>   65: 0
>   71.1: 0
>   77: 0
>   78: 0
>   81: 3
>   84: 2
>   86: 0
>   89: 0
>   91: 0
>   92: 0
>   92.1: 0
>   98: 0
>   99: 0
>   103: 0
>   108: 0
>   108.1: 0
>   111: 0
>   114: 0
>   120: 1
>   124: 0
>   125: 0
>   127: 0
>   128: 0
>   130: 0
>   131: 0
>   134: 0
>   139: 0
>   140: 0
>   143: 1
>   6: lookup_attribute total:40
>     4: 5
> 
> 
> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node
>  total:212 head:71
>   2: 71  
> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node.part.0:5
>   143: 141
> 
> This looks odd. Looks like create_gcovt getting  mixed up with the offset of 
> inlined functions

I am not sure I follow what you mean here?

This is my current benchmark run with -Ofast -mtune=native (without LTO)
comparing no feedback (base) to autofdo (peak)

500.perlbench_r       1      167         9.51  *       1      155        10.3   
*
502.gcc_r             1      132        10.7   *       1      126        11.2   
*
505.mcf_r             1      226         7.16  *       1      225         7.20  
*
520.omnetpp_r         1      203         6.47  *       1      203         6.47  
*
523.xalancbmk_r                               NR                               
NR
525.x264_r            1       84.7      20.7   *       1       90.7      19.3   
*
531.deepsjeng_r       1      208         5.50  *       1      209         5.47  
*
541.leela_r           1      295         5.61  *       1      318         5.21  
*
548.exchange2_r       1       85.9      30.5   *       1       93.3      28.1   
*
557.xz_r              1      225         4.79  *       1      220         4.90  
*
 Est. SPECrate2017_int_base              9.13
 Est. SPECrate2017_int_peak                                               9.05

So there are regressions in x264, deepsjeng, leela and exchange neighter
of them very bad.  I think it would be interesting to understand
541.leela_r first.

Honza
> 
> Thanksm
> Kugabn

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

Reply via email to