[Bug fortran/68600] Inlined MATMUL is too slow.

Joost.VandeVondele at mat dot ethz.ch Mon, 30 Nov 2015 06:19:06 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600


Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Joost.VandeVondele at mat dot 
ethz
                   |                            |.ch

--- Comment #7 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 
---
(In reply to Dominique d'Humieres from comment #6)
> Note a problem when 16x16 matrices are inlined with -mavx (I'll investigate
> and file a PR for it).

that's a good find!

I ran locally on haswell, and find these numbers, including openblas, and
libxsmm. 

./a.out
 Size     Loops          Matmul           newmatmul     dgemm-like         
dgemm
                      fixed explicit      internal      libxsmm         
openblas

=====================================================================================
    2    200000           1.562           0.107           0.104           0.139
    4    200000           6.781           0.779           1.012           0.887
    8    200000           7.424           3.360           6.150           4.732
   16    200000           2.954           7.290          14.421          11.527
   32    200000          10.401          10.251          24.396          18.071
   64     30757          12.696          14.196          27.385          24.547
  128      3829           8.646          17.684          31.460          31.530
  256       477           7.834          19.123          37.457          37.471
  512        59           8.064          19.473          40.738          40.755
 1024         7           8.334          19.475          40.931          41.112
 2048         1           3.042          19.157          41.225          41.279


so the 'newmatmul' code gets about 50% of peak. Inlined matmul is good up to
size 8/16, 16-64 libxsmm wins, >64 openblas is better. For the small sizes it
is mostly related to call eliminated overhead, I think.

[Bug fortran/68600] Inlined MATMUL is too slow.

Reply via email to