To get back on track with uComplex, I didn't change any routines to make them inline - they were that way already.  All I did was change the parameters to 'const', align the complex type so it is equivalent to __m128d so the System V ABI can pass it all in one register, and enable vectorcall on Win64 so the same thing can happen on that platform.  Is that really too much?

Changing the Win64 build of FPC to default to vectorcall is an option, although the option to fall back to the fastcall-based convention needs to exist for the sake of interfacing with third-party libraries, and it doesn't change the fact that the complex type still needs to be aligned.  Either way, it might break assembler code that calls the uComplex functions, but my argument still stands that I don't think this a realistic set-up in the wide scheme of things.

Gareth aka. Kit

On 31/10/2019 21:13, Florian Klämpfl wrote:
Am 31.10.19 um 20:11 schrieb Marco van de Voort:

Op 2019-10-30 om 23:02 schreef Florian Klämpfl:

Yes. And manually adding inline is only as good as the knowledge of the user doing so. If somebody implements it right (I did not, I used the easiest approach and used an existing function to estimate the complexity of a subroutine). The compiler can just count the number of the generate instructions or even calculate the length of the procedure and then decide to keep the node tree for inlining.

Well, it depends of course of what happens when. Would you really count final instructions or cycles after all optimization and peephole passes ?

This is not really an issue: actually for inlining mainly instructions/code length matters and e.g. the arm compiler even does this (actually something more complex) as it has to insert the constant tables at the right locations into the code because the relative offsets are limited.
fpc-devel maillist  -

fpc-devel maillist  -

Reply via email to