https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
Thomas Koenig changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #15 from Jerry DeLisle ---
I wonder if we should back port this as well since the bug can have a serious
performance hit without it. ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #14 from Thomas Koenig ---
Author: tkoenig
Date: Mon May 8 18:22:44 2017
New Revision: 247755
URL: https://gcc.gnu.org/viewcvs?rev=247755=gcc=rev
Log:
2017-05-08 Thomas Koenig
PR fortran/79930
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
Dominique d'Humieres changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #12 from Adam Hirst ---
Created attachment 40940
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40940=edit
call graph of my "real" application
Thanks Thomas,
My "real" application is of course not using random numbers for the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #11 from Thomas Koenig ---
A couple of points:
First, the slow random number generation. While I do not
understand why using the loop the way you do makes things
slower with optimization, it is _much_ faster to generate
random
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #10 from Thomas Koenig ---
(In reply to Richard Biener from comment #9)
> If dot_product (matmul (...), ..) can be implemented more optimally (is
> there a blas/lapack primitive for it?) then the best course of action is to
> pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
Richard Biener changed:
What|Removed |Added
Keywords||missed-optimization
--- Comment #9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #8 from Adam Hirst ---
Ah, it seems that Jerry was tinkering with tp_array.f90 (intrinsic array
version of the Vector type), while I was with tp_xyz.f90 (explicit separate
elements). I was going to remark at how he didn't need to use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #7 from Adam Hirst ---
OK, I tried a little harder, and was able to get a performance increase.
type(Vect3D) pure function TP_LEFT(NU, D, NV) result(tensorproduct)
real(dp), intent(in) :: NU(4), NV(4)
type(Vect3D),
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #6 from Jerry DeLisle ---
Thanks Thomas, somehow I thought we would have built the temporary to do this.
(Well actully we do, but after the frontend passes)
Now we get:
$ gfc -O2 tp_array.f90
$ time ./a.out
This code variant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #5 from Adam Hirst ---
Hmm, even with -Ofast, I don't get any noticeable performance increase if I
change, say, TP_LEFT, to be:
type(Vect3D) pure function TP_LEFT(NU, D, NV) result(tensorproduct)
real(dp), intent(in) ::
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #4 from Thomas Koenig ---
Currently, we only inline statements of the form
a = matmul(b,c)
so the more complex expressions in your code are not
inlined (and thus slow). This is a known limitation,
which will not be fixed in time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #3 from Adam Hirst ---
Created attachment 40898
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40898=edit
Implementation using dimension(3) member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #2 from Adam Hirst ---
Created attachment 40897
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40897=edit
Implementation using %x %y and %z members
Will post the source code here as attachments.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
Jerry DeLisle changed:
What|Removed |Added
CC||jvdelisle at gcc dot gnu.org,
16 matches
Mail list logo