[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

amker at gcc dot gnu.org Thu, 18 Jan 2018 02:31:42 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604


--- Comment #14 from amker at gcc dot gnu.org ---
(In reply to rguent...@suse.de from comment #13)
> On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604
> > 
> > --- Comment #12 from amker at gcc dot gnu.org ---
> > (In reply to rguent...@suse.de from comment #11)
> > > 
> > Yes, this can be done.  For now, it's disabled because without classifying
> > zeroing stmt as a builtin partition, it's fused because of shared memory
> > reference to y(l,i,j,k).  This step can be made by cost model changes.  The
> > on;y problem is the cost model change doesn't make sense here (without
> > considering builtin partition stuff, it should be fused, right?)
> 
> It might be profitable to distribute away stores that have no dependent
> stmts (thus stores from invariants).
> 
> Another heuristic would be to never merge builtin partitions with
> other partitions because loop optimizations do not handle memory

Together with last sentence of your comment.  IIUC, so what we want to do is
still a builtin partition distribution from the original loop.  The only
difference is now the loop nest of zeroing stmt is distributed into a
loop(outer) of memset call, rather than a single memset call.  Of course if it
would be even better if it can be distributed into a single memset.
Currently such loop nest in this case is not classified as builtin partition.

> builtins (the data dependence limitation).  Which might also be a reason
> not to handle those as builtins but revert to a non-builtin
> classification.
But I don't quite follow this sentence,  why not handle it as builtins?  it is
special, but eventually we want to distribute it into memset (in a loop nest),
right?

Thanks
> 
> I suppose implementing both and then looking at what distributions
> change due to them on say SPEC CPU 2006, classifying each change
> as either good or bad is the only way we'd know whether such
> cost model change is wanted.
> 
> > > And then do memset replacement in the first loop.
> > I guess this step is equally hard to what I mentioned?  We still need to 
> > prove
> > loops of zeroing statement doesn't leave bubble in memory.
> 
> No, you'd simply have the i and j loops containing a memset...

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

Reply via email to