On Thu, 31 Aug 2023, Filip Kastl wrote:

> > The most obvious places would be right after SSA construction and before 
> > RTL expansion.
> > Can you provide measurements for those positions?
> 
> The algorithm should only remove PHIs that break SSA form minimality. Since
> GCC's SSA construction already produces minimal SSA form, the algorithm isn't
> expected to remove any PHIs if run right after the construction. I even
> measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
> algorithm managed to remove exactly 1 PHI, which is weird). 
> 
> I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> remove at that point, but there still are some.

That's interesting.  Your placement at

          NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
          NEXT_PASS (pass_phiopt, true /* early_p */);
+         NEXT_PASS (pass_sccp);

and

       NEXT_PASS (pass_tsan);
       NEXT_PASS (pass_dse, true /* use DR analysis */);
       NEXT_PASS (pass_dce);
+      NEXT_PASS (pass_sccp);

isn't immediately after the "best" existing pass we have to
remove dead PHIs which is pass_cd_dce.  phiopt might leave
dead PHIs around and the second instance runs long after the
last CD-DCE.

So I wonder if your pass just detects unnecessary PHIs we'd have
removed by other means and what survives until RTL expansion is
what we should count?

Can you adjust your original early placement to right after
the cd-dce pass and for the late placement turn the dce pass
before it into cd-dce and re-do your measurements?

> 500.perlbench_r
> Started with 43111
> Ended with 42942
> Removed PHI % .39201131961680313700
> 
> 502.gcc_r
> Started with 141392
> Ended with 140455
> Removed PHI % .66269661649881181400
> 
> 505.mcf_r
> Started with 482
> Ended with 478
> Removed PHI % .82987551867219917100
> 
> 523.xalancbmk_r
> Started with 136040
> Ended with 135629
> Removed PHI % .30211702440458688700
> 
> 531.deepsjeng_r
> Started with 2150
> Ended with 2148
> Removed PHI % .09302325581395348900
> 
> 541.leela_r
> Started with 4664
> Ended with 4650
> Removed PHI % .30017152658662092700
> 
> 557.xz_r
> Started with 43
> Ended with 43
> Removed PHI % 0
> 
> > Can the pass somehow be used as part of propagations like during value 
> > numbering?
> 
> I don't think that the pass could be used as a part of different optimizations
> since it works on the whole CFG (except for copy propagation as I noted in the
> RFC). I'm adding Honza into Cc. He'll have more insight into this.
> 
> > Could the new file be called gimple-ssa-sccp.cc or something similar?
> 
> Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
> appropriate?
> 
> I'm thinking about naming the pass 'scc-copy' and the file
> 'tree-ssa-scc-copy.cc'.
> 
> > Removing some PHIs is nice, but it would be also interesting to know what
> > are the effects on generated code size and/or performance.
> > And also if it has any effects on debug information coverage.
> 
> Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
> and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
> Here are the full results. What do you think? Should I run more benchmarks? Or
> benchmark multiple times? Or run the benchmarks on different machines?*
> 
> 500.perlbench_r    
> Without SCCP: 244.151807s    
> With SCCP: 242.448438s    
> -0.7025695913124297%    
>     
> 502.gcc_r    
> Without SCCP: 211.029606s    
> With SCCP: 211.614523s    
> +0.27640683243653763%    
>     
> 505.mcf_r    
> Without SCCP: 298.782621s    
> With SCCP: 291.671468s    
> -2.438069465197046%    
>     
> 523.xalancbmk_r    
> Without SCCP: 189.940639s    
> With SCCP: 189.876261s    
> -0.03390523894928332%    
>     
> 531.deepsjeng_r    
> Without SCCP: 250.63648s    
> With SCCP: 250.988624s    
> +0.1403027732444051%    
>     
> 541.leela_r    
> Without SCCP: 346.066278s    
> With SCCP: 339.692987s    
> -1.8761915152519792%
> 
> Regarding size: The pass doesn't seem to significantly reduce or increase the
> size of the result binary. The differences were at most ~0.1%.
> 
> Regarding debug info coverage: I didn't notice any additional guality 
> testcases
> failing after I applied the patch. *Is there any other way how I should check
> debug info coverage?*
> 
> 
> Filip K
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to