Hi,

I ran the following script to gather data with trunk (from 20100615)
and Graphite branch (today).

for i in `ls -1 *.f90`; do
    echo -n $i
    $FC $OPT -c ./$i &> out
    grep "LOOP VECTORIZED" out | wc
done

The following columns correspond to the number of lines reported by wc.

Trunk0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
Trunk1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math -fgraphite-identity"
Gr0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
Gr1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math
-fgraphite-identity -fno-loop-strip-mine -fno-loop-interchange
-fno-loop-block"

                Trunk0  Trunk1  Gr0     Gr1
ac.f90          30      30      29      29
aermod.f90      151     110     147     147
air.f90         4       3       4       4
capacita.f90    17      11      13      13
channel.f90     15      14      14      14
doduc.f90       155     146     155     155
fatigue.f90     15      15      15      15
gas_dyn.f90     44      42      41      41
induct.f90      9       5       5       5
linpk.f90       14      3       14      14
mdbx.f90        12      8       12      12
nf.f90          51      34      50      50
protein.f90     31      31      31      31
rnflow.f90      87      75      85      85
test_fpu.f90    80      65      78      78
tfft.f90        4       3       4       4

Overall, with the recent changes that I pushed to the Graphite branch
and that should be stable by now, we improved the vectorization of
loops generated by Graphite.

The improvements in today's Graphite branch Gr1 with respect to
Trunk1, that is trunk with -fgraphite-identity are the difference
between Gr1 and Trunk1 (higher is more loops vectorized by Gr1):

ac.f90          -1
aermod.f90      37
air.f90         1
capacita.f90    2
channel.f90     0
doduc.f90       9
fatigue.f90     0
gas_dyn.f90     -1
induct.f90      0
linpk.f90       11
mdbx.f90        4
nf.f90          16
protein.f90     0
rnflow.f90      10
test_fpu.f90    13
tfft.f90        1

There still are some missed vectorization cases, see the difference
between Trunk0 and Gr0:

ac.f90          1
aermod.f90      4
air.f90         0
capacita.f90    4
channel.f90     1
doduc.f90       0
fatigue.f90     0
gas_dyn.f90     3
induct.f90      4
linpk.f90       0
mdbx.f90        0
nf.f90          1
protein.f90     0
rnflow.f90      2
test_fpu.f90    2
tfft.f90        0

After these changes are merged to trunk, we should revisit the
following PRs:

http://gcc.gnu.org/PR38846: 35% slower using -floop* than without graphite
http://gcc.gnu.org/PR40979: induct benchmark 60% slower when compiled
with -fgraphite
http://gcc.gnu.org/PR43359: gas_dyn benchmark exhibits missed
vectorization with graphite

Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

Reply via email to