Re: Loop fusion.

Richard Biener Mon, 23 Apr 2018 05:47:48 -0700

On Mon, Apr 23, 2018 at 2:31 PM, Janne Blomqvist
<blomqvist.ja...@gmail.com> wrote:
> On Mon, Apr 23, 2018 at 2:02 PM, Richard Biener <richard.guent...@gmail.com>
> wrote:
>>
>> On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> > On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene <t...@moene.org> wrote:
>> >> A few days ago there was a rant on the Fortran Standardization
>> >> Committee's
>> >> e-mail list about Fortran's "whole array arithmetic" being
>> >> unoptimizable.
>> >>
>> >> An example picked at random from our weather forecasting code:
>> >>
>> >>     ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
>> >>     ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
>> >>     ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
>> >>     ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
>> >>
>> >> The reaction from one of the members of the committee (about "their"
>> >> compiler):
>> >>
>> >> 'And multiple consecutive array statements with the same shape are
>> >> “fused”
>> >> exactly so that the compiler can generate good cache use. This sort of
>> >> optimization is pretty low hanging fruit.'
>> >>
>> >> As far as I can see loop fusion as a stand-alone optimization is not
>> >> supported as yet, although some mention is made in the context of
>> >> graphite.
>> >>
>> >> Is this something that should be pursued ?
>> > Hi,
>> > I don't know the current status of fusion in graphite.  As for
>> > traditional fusion transformation, I think it's not very difficult to
>> > be implemented along with existing distribution, actually, quite lot
>> > of code should be shared.  What we do need are something like: more
>> > motivation cases, good/conservative cost model.
>>
>> Yes, I guess before distribution you want to do maximum fusion and then
>> apply (re-)distribution on the fused loop.  The cost model should be the
>> very same for distribution/fusion.
>>
>> Richard.
>
>
>
> I recall Fujitsu bragging that the key to them getting good application
> performance (read: outside linpack) on the K computer is extensive use of
> loop FISSION + software pipelining. Though I guess sw-pipelining is only
> useful if you have lots of architectural registers, which disqualifies
> x86-64..


FISSION we can do quite well (though we lack a cost model here), that's
what loop distribution does.

Richard.

>
> --
> Janne Blomqvist

Re: Loop fusion.

Reply via email to