On 05/19/2017 11:24 PM, Sven Barth via fpc-pascal wrote:
On 19.05.2017 19:22, Karoly Balogh (Charlie/SGR) wrote:
Hi,

On Fri, 19 May 2017, Sven Barth via fpc-pascal wrote:

I think Jeppe wanted to add vector support. Though the question here is
whether one wants to optimize/detect this at the AST level and convert
that to implicit vectors or at the CSE level.
I think the higher level you can do an optimization/simplification, the
higher you should do it. Otherwise the lower layers get really messy, as
they already are in some cases. Well, in general, we should up our
floating point game. For example if Nikolay's recent load-modify-store
optimization would work on floats, that would already a nice step forward
in this case. ;) (Sorry for my ignorance, if it already works, missed that
then.)
No, it does not work for floats, yet, but feel free to add support for them as well :)
I agree that we should improve that. Maybe we should also allow for more
FPU type specific helper routines. Currently on i386 and x86_64 the x87
FPU will be used even if -CfsseX is given and only Single/Double are
used, cause the compiler defaults to Extended. If SSE isn't used that
might make sense, but for SSE we should fall back to Double if we're
only dealing with double, IMHO (and analogous for Single).

By the way: I think my commit today of a SSE Frac() implementation sped
up the framerate by a third on Win64 compared to the one without it :D
Cool, but shouldn't this be an inline node instead for real speed++...? ;)
I mean if Trunc() and Round() are...
Ah, right, hadn't seen that we do indeed have an inline node
implementation for x86. I should probably put that on the list then :D
Yes, we do. And we can, in fact, make similar ones for many routines in the math unit as well. In fact, it is on my todo list, but feel free to start working on it, if you have time, since I have also other things to do and I don't know when I'm going to even start this one :) Btw, the sincos() routine is also a good candidate for inlining, and so are the divmod routines and the min/max routines (they are a good candidate for using the cmov instruction on i686+). When we have these as inline, we can then even add optimization passes that convert calls to sin(x) and cos(x) that are close to each other with the same parameter and no side effects between them to sincos(), same for div and mod -> divmod, etc.

Nikolay
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to