https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604
--- Comment #14 from amker at gcc dot gnu.org --- (In reply to rguent...@suse.de from comment #13) > On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604 > > > > --- Comment #12 from amker at gcc dot gnu.org --- > > (In reply to rguent...@suse.de from comment #11) > > > > > Yes, this can be done. For now, it's disabled because without classifying > > zeroing stmt as a builtin partition, it's fused because of shared memory > > reference to y(l,i,j,k). This step can be made by cost model changes. The > > on;y problem is the cost model change doesn't make sense here (without > > considering builtin partition stuff, it should be fused, right?) > > It might be profitable to distribute away stores that have no dependent > stmts (thus stores from invariants). > > Another heuristic would be to never merge builtin partitions with > other partitions because loop optimizations do not handle memory Together with last sentence of your comment. IIUC, so what we want to do is still a builtin partition distribution from the original loop. The only difference is now the loop nest of zeroing stmt is distributed into a loop(outer) of memset call, rather than a single memset call. Of course if it would be even better if it can be distributed into a single memset. Currently such loop nest in this case is not classified as builtin partition. > builtins (the data dependence limitation). Which might also be a reason > not to handle those as builtins but revert to a non-builtin > classification. But I don't quite follow this sentence, why not handle it as builtins? it is special, but eventually we want to distribute it into memset (in a loop nest), right? Thanks > > I suppose implementing both and then looking at what distributions > change due to them on say SPEC CPU 2006, classifying each change > as either good or bad is the only way we'd know whether such > cost model change is wanted. > > > > And then do memset replacement in the first loop. > > I guess this step is equally hard to what I mentioned? We still need to > > prove > > loops of zeroing statement doesn't leave bubble in memory. > > No, you'd simply have the i and j loops containing a memset...