When it comes to testing vectorcall, uComplex isn't the best example
actually because most of the operators are inlined. There are a number
of tests under "tests/test/cg" that test vectorcall and the System V ABI
using a Pascal implementation of the opaque __m128 type (the two ABIs
should behave exactly the same when dealing with simple vectors).
If anything though, the example function you gave (I'll need to
double-check what ComplexScl does though, if it isn't a simple
multiplication) would be a pretty solid and heavy-duty test of the
compiler attempting to vectorise the code - in an ideal world,
individual calls to ComplexAdd and ComplexSub (which are simple + and -
operations in uComplex) will compile into a single line of assembly
language (ADDPD and SUBPD respectively). Nevertheless, one could
disable the inlining to see how well the compiler handles the function
chaining, since with aligned data, the result from XMM0 should be easily
transposed in one go to another XMM register if not just left alone as
parameter data for the next function.
Gareth aka. Kit
On 29/10/2019 11:06, Marco van de Voort wrote:
Op 2019-10-27 om 09:02 schreef Florian Klämpfl:
I guess you're right. It just seems weird because the System V ABI
was designed from the start to use the MM registers fully, so long as
the data is aligned. In effect, it had vectorcall wrapped into its
design from the start. Granted, vectorcall has some advantages and
can deal with relatively complex aggregates that the System V ABI
cannot handle (for example, a record type that contains a normal
vector and information relating to bump mapping).
I just hoped that making updates to uComplex, while ensuring
existing Pascal code still compiles, would help take advantage of
modern ABI designs.
Is there currently any example which shows that vectorcall has any
advantage with FPC? Else I would propose first to make FPC able to
take advantage of it and then talk about if we really add vectorcall.
Currently I fear, FPC gets only into trouble when using vectorcall as
it tries first to push everything into one xmm register and then
splits this again in the callee.
Nils Haeck's FFT unit might be interesting. (same guy as nativejpg
unit iirc, http://www.simdesign.nl)
It is a D7 language level unit that uses a complex record and simple
procedures as options. It should be easy to transpose to ucomplex. It
is quite hll and switchable between single and double. (I use it in
single mode, but to test vectorcall, obviously double mode would be
best?)
And it has routines that do a variety of complex operations.
procedure FFT_5(var Z: array of TComplex); // usage of open array is
to make things generic. Could be solved differently.
var
T1, T2, T3, T4, T5: TComplex;
M1, M2, M3, M4, M5: TComplex;
S1, S2, S3, S4, S5: TComplex;
begin
T1 := ComplexAdd(Z[1], Z[4]);
T2 := ComplexAdd(Z[2], Z[3]);
T3 := ComplexSub(Z[1], Z[4]);
T4 := ComplexSub(Z[3], Z[2]);
T5 := ComplexAdd(T1, T2);
Z[0] := ComplexAdd(Z[0], T5);
M1 := ComplexScl(c51, T5);
M2 := ComplexScl(c52, ComplexSub(T1, T2));
M3.Re := -c53 * (T3.Im + T4.Im); // replace by
i*add(t3,t4).scale(c53-i*c53) ?
M3.Im := c53 * (T3.Re + T4.Re);
M4.Re := -c54 * T4.Im;
M4.Im := c54 * T4.Re;
M5.Re := -c55 * T3.Im;
M5.Im := c55 * T3.Re;
S3 := ComplexSub(M3, M4);
S5 := ComplexAdd(M3, M5);;
S1 := ComplexAdd(Z[0], M1);
S2 := ComplexAdd(S1, M2);
S4 := ComplexSub(S1, M2);
Z[1] := ComplexAdd(S2, S3);
Z[2] := ComplexAdd(S4, S5);
Z[3] := ComplexSub(S4, S5);
Z[4] := ComplexSub(S2, S3);
end;
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel