On Fri, 1 Sep 2023, Filip Kastl wrote:
> > That's interesting. Your placement at
> >
> > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> > NEXT_PASS (pass_phiopt, true /* early_p */);
> > + NEXT_PASS (pass_sccp);
> >
> > and
> >
> > NEXT_PASS (pass_tsan);
> > NEXT_PASS (pass_dse, true /* use DR analysis */);
> > NEXT_PASS (pass_dce);
> > + NEXT_PASS (pass_sccp);
> >
> > isn't immediately after the "best" existing pass we have to
> > remove dead PHIs which is pass_cd_dce. phiopt might leave
> > dead PHIs around and the second instance runs long after the
> > last CD-DCE.
> >
> > So I wonder if your pass just detects unnecessary PHIs we'd have
> > removed by other means and what survives until RTL expansion is
> > what we should count?
> >
> > Can you adjust your original early placement to right after
> > the cd-dce pass and for the late placement turn the dce pass
> > before it into cd-dce and re-do your measurements?
>
> So I did this
>
> NEXT_PASS (pass_dse);
> NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> NEXT_PASS (pass_sccp);
> NEXT_PASS (pass_phiopt, true /* early_p */);
> NEXT_PASS (pass_tail_recursion);
>
> and this
>
> NEXT_PASS (pass_dse, true /* use DR analysis */);
> NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> NEXT_PASS (pass_sccp);
> /* Pass group that runs when 1) enabled, 2) there are loops
>
> and got these results:
>
> 500.perlbench_r
> Started with (1) 30318
> Ended with (1) 26219
> Removed PHI % (1) 13.52002110957187149600
> Started with (2) 39043
> Ended with (2) 38941
> Removed PHI % (2) .26125041620777092000
>
> 502.gcc_r
> Started with (1) 148361
> Ended with (1) 140464
> Removed PHI % (1) 5.32282742769326170700
> Started with (2) 216209
> Ended with (2) 215367
> Removed PHI % (2) .38943799749316633500
>
> 505.mcf_r
> Started with (1) 342
> Ended with (1) 304
> Removed PHI % (1) 11.11111111111111111200
> Started with (2) 437
> Ended with (2) 433
> Removed PHI % (2) .91533180778032036700
>
> 523.xalancbmk_r
> Started with (1) 62995
> Ended with (1) 58289
> Removed PHI % (1) 7.47043416144138423700
> Started with (2) 134026
> Ended with (2) 133193
> Removed PHI % (2) .62152119737961291100
>
> 531.deepsjeng_r
> Started with (1) 1402
> Ended with (1) 1264
> Removed PHI % (1) 9.84308131241084165500
> Started with (2) 1928
> Ended with (2) 1920
> Removed PHI % (2) .41493775933609958600
>
> 541.leela_r
> Started with (1) 3398
> Ended with (1) 3060
> Removed PHI % (1) 9.94702766333137139500
> Started with (2) 4473
> Ended with (2) 4453
> Removed PHI % (2) .44712720769058797300
>
> 557.xz_r
> Started with (1) 47
> Ended with (1) 44
> Removed PHI % (1) 6.38297872340425532000
> Started with (2) 43
> Ended with (2) 43
> Removed PHI % (2) 0
>
> These measurements don't differ very much from the previous. It seems to me
> that phiopt does output some redundant PHIs but the vast majority of the
> eliminated PHIs are generated in earlier passes and cd_dce isn't able to get
> rid of them.
>
> A noteworthy information might be that most of the eliminated PHIs are
> actually
> trivial PHIs. I consider a PHI to be trivial if it only references itself or
> one other SSA name.
Ah. The early pass numbers are certainly intresting - can you elaborate
on the last bit? We have for example loop-closed PHI nodes like
_1 = PHI <_2>
and there are non-trivial degenerate PHIs like
_1 = PHI <_2, _2>
those are generally removed by value-numbering (FRE, DOM and PRE) and SSA
propagation (CCP and copyprop), they are not "dead" so CD-DCE doesn't
remove them.
But we do have passes removing these kind of PHIs.
The issue with the early pass is likely that we have
NEXT_PASS (pass_fre, true /* may_iterate */);
^^
would elimimate these kind of PHIs
NEXT_PASS (pass_early_vrp);
^^
rewrites into loop-closed SSA, adding many such PHIs
NEXT_PASS (pass_merge_phi);
NEXT_PASS (pass_dse);
NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
and until here there's no pass eliding the LC SSA PHIs.
You could add a pass_copy_prop after early_vrp, the later sccp
pass shouldn't run into this issue I think so it must be other
passes adding such kind of PHIs.
Maybe you can count single-argument PHIs, degenerate multi-arg PHIs
and "other" PHIs separately as you remove them?
> Here is a comparison of the newest measurements (sccp after cd_dce) with the
> previous ones (sccp after phiopt and dce):
>
> 500.perlbench_r
>
> Started with (1-PREV) 30287
> Started with (1-NEW) 30318
>
> Ended with (1-PREV) 26188
> Ended with (1-NEW) 26219
>
> Removed PHI % (1-PREV) 13.53385941162875161000
> Removed PHI % (1-NEW) 13.52002110957187149600
>
> Started with (2-PREV) 38005
> Started with (2-NEW) 39043
>
> Ended with (2-PREV) 37897
> Ended with (2-NEW) 38941
>
> Removed PHI % (2-PREV) .28417313511380081600
> Removed PHI % (2-NEW) .26125041620777092000
>
> 502.gcc_r
>
> Started with (1-PREV) 148187
> Started with (1-NEW) 148361
>
> Ended with (1-PREV) 140292
> Ended with (1-NEW) 140464
>
> Removed PHI % (1-PREV) 5.32772780338356266100
> Removed PHI % (1-NEW) 5.32282742769326170700
>
> Started with (2-PREV) 211479
> Started with (2-NEW) 216209
>
> Ended with (2-PREV) 210635
> Ended with (2-NEW) 215367
>
> Removed PHI % (2-PREV) .39909399987705635100
> Removed PHI % (2-NEW) .38943799749316633500
>
>
> Filip K
>
>
> P.S. I made a small mistake and didn't compute the benchmark speedup
> percentages right in the previous email. Here are the corrected results. The
> correct percentages are a little bit smaller but very similar. There is still
> a
> ~2% speedup with 505.mcf_r and 541.leela_r.
>
> 500.perlbench_r
> Without SCCP: 244.151807s
> With SCCP: 242.448438s
> -0.6976679881791663%
>
> 502.gcc_r
> Without SCCP: 211.029606s
> With SCCP: 211.614523s
> +0.27717295742853737%
>
> 505.mcf_r
> Without SCCP: 298.782621s
> With SCCP: 291.671468s
> -2.380042378703145%
>
> 523.xalancbmk_r
> Without SCCP: 189.940639s
> With SCCP: 189.876261s
> -0.03389374719330334%
>
> 531.deepsjeng_r
> Without SCCP: 250.63648s
> With SCCP: 250.988624s
> +0.14049989849840747%
>
> 541.leela_r
> Without SCCP: 346.066278s
> With SCCP: 339.692987s
> -1.8416388435281157%
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)