Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

Jerry DeLisle Sun, 13 Nov 2016 17:01:39 -0800

On 11/13/2016 04:55 PM, Steve Kargl wrote:

On Sun, Nov 13, 2016 at 04:08:50PM -0800, Jerry DeLisle wrote:

Hi all,


Attached patch implements a fast blocked matrix multiply. The basic algorithm is
derived from netlib.org tuned blas dgemm. See matmul.m4 for reference.

The matmul() function is compiled with -Ofast -funroll-loops. This can be
customized further if there is an undesired optimization being used. This is
accomplished using #pragma optimize ( string ).


Did you run any tests with '--param max-unroll-times=4' where
the 4 could be something other than 4.  On troutmask, with my
code I've found that 4 seems to work the best with -funroll-loops.

Have not tried this, will give it a try. Also, I have not tested on your FreeBSDmachine yet.


Jerry

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

Reply via email to