https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789

--- Comment #18 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #10)
> (In reply to Kewen Lin from comment #9)
> > (In reply to Richard Biener from comment #8)
> > > (In reply to Kewen Lin from comment #7)
> > > > Two questions in mind, need to dig into it further:
> > > >   1) from the assembly of scalar/vector code, I don't see any stores 
> > > > needed
> > > > into temp array d (array diff in pixel_sub_wxh), but when modeling we
> > > > consider the stores.
> > > 
> > > Because when modeling they are still there.  There's no good way around 
> > > this.
> > > 
> > 
> > I noticed the stores get eliminated during FRE.  Can we consider running FRE
> > once just before SLP? a bad idea due to compilation time?
> 
> Yeah, we already run FRE a lot and it is one of the more expensive passes.
> 
> Note there's one point we could do better which is the embedded SESE FRE
> run from cunroll which is only run before we consider peeling an outer loop
> and thus not for the outermost unrolled/peeled code (but the question would
> be from where / up to what to apply FRE to).  On x86_64 this would apply to
> the unvectorized but then unrolled outer loop from pixel_sub_wxh which feeds
> quite bad IL to the SLP pass (but that shouldn't matter too much, maybe it
> matters for costing though).

By following this idea, to release the restriction on loop_outer (loop_father)
when setting the father_bbs, I can see FRE works as expectedly.  But it
actually does the rpo_vn from cfun's entry to its exit. If it's taken as
costly, we probably can guard it to take effects only when all its inner loops
are unrolled, for this case, all of its three loops are unrolled.
Besides, when SLP happens, FRE gen the bit_field_ref and remove array d, but
for scalar codes it needs one more time dse run after cunroll to get array d
eliminated. I guess it's not costly? Can one pass be run or not controlled by
something in another pass? via global variable and add one parameter in
passes.def seems weird. If it's costly, probably we can go by factoring out one
routine to be called instead of running a pass, like do_rpo_vn?

Reply via email to