On Mon, Apr 23, 2018 at 2:31 PM, Janne Blomqvist <blomqvist.ja...@gmail.com> wrote: > On Mon, Apr 23, 2018 at 2:02 PM, Richard Biener <richard.guent...@gmail.com> > wrote: >> >> On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> > On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene <t...@moene.org> wrote: >> >> A few days ago there was a rant on the Fortran Standardization >> >> Committee's >> >> e-mail list about Fortran's "whole array arithmetic" being >> >> unoptimizable. >> >> >> >> An example picked at random from our weather forecasting code: >> >> >> >> ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP) >> >> ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP) >> >> ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP) >> >> ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP) >> >> >> >> The reaction from one of the members of the committee (about "their" >> >> compiler): >> >> >> >> 'And multiple consecutive array statements with the same shape are >> >> “fused” >> >> exactly so that the compiler can generate good cache use. This sort of >> >> optimization is pretty low hanging fruit.' >> >> >> >> As far as I can see loop fusion as a stand-alone optimization is not >> >> supported as yet, although some mention is made in the context of >> >> graphite. >> >> >> >> Is this something that should be pursued ? >> > Hi, >> > I don't know the current status of fusion in graphite. As for >> > traditional fusion transformation, I think it's not very difficult to >> > be implemented along with existing distribution, actually, quite lot >> > of code should be shared. What we do need are something like: more >> > motivation cases, good/conservative cost model. >> >> Yes, I guess before distribution you want to do maximum fusion and then >> apply (re-)distribution on the fused loop. The cost model should be the >> very same for distribution/fusion. >> >> Richard. > > > > I recall Fujitsu bragging that the key to them getting good application > performance (read: outside linpack) on the K computer is extensive use of > loop FISSION + software pipelining. Though I guess sw-pipelining is only > useful if you have lots of architectural registers, which disqualifies > x86-64..
FISSION we can do quite well (though we lack a cost model here), that's what loop distribution does. Richard. > > -- > Janne Blomqvist