Op 2019-10-27 om 09:02 schreef Florian Klämpfl:
I guess you're right. It just seems weird because the System V ABI
was designed from the start to use the MM registers fully, so long as
the data is aligned. In effect, it had vectorcall wrapped into its
design from the start. Granted, vectorcall has some advantages and
can deal with relatively complex aggregates that the System V ABI
cannot handle (for example, a record type that contains a normal
vector and information relating to bump mapping).
I just hoped that making updates to uComplex, while ensuring existing
Pascal code still compiles, would help take advantage of modern ABI
designs.
Is there currently any example which shows that vectorcall has any
advantage with FPC? Else I would propose first to make FPC able to
take advantage of it and then talk about if we really add vectorcall.
Currently I fear, FPC gets only into trouble when using vectorcall as
it tries first to push everything into one xmm register and then
splits this again in the callee.
Nils Haeck's FFT unit might be interesting. (same guy as nativejpg unit
iirc, http://www.simdesign.nl)
It is a D7 language level unit that uses a complex record and simple
procedures as options. It should be easy to transpose to ucomplex. It is
quite hll and switchable between single and double. (I use it in single
mode, but to test vectorcall, obviously double mode would be best?)
And it has routines that do a variety of complex operations.
procedure FFT_5(var Z: array of TComplex); // usage of open array is to
make things generic. Could be solved differently.
var
T1, T2, T3, T4, T5: TComplex;
M1, M2, M3, M4, M5: TComplex;
S1, S2, S3, S4, S5: TComplex;
begin
T1 := ComplexAdd(Z[1], Z[4]);
T2 := ComplexAdd(Z[2], Z[3]);
T3 := ComplexSub(Z[1], Z[4]);
T4 := ComplexSub(Z[3], Z[2]);
T5 := ComplexAdd(T1, T2);
Z[0] := ComplexAdd(Z[0], T5);
M1 := ComplexScl(c51, T5);
M2 := ComplexScl(c52, ComplexSub(T1, T2));
M3.Re := -c53 * (T3.Im + T4.Im); // replace by
i*add(t3,t4).scale(c53-i*c53) ?
M3.Im := c53 * (T3.Re + T4.Re);
M4.Re := -c54 * T4.Im;
M4.Im := c54 * T4.Re;
M5.Re := -c55 * T3.Im;
M5.Im := c55 * T3.Re;
S3 := ComplexSub(M3, M4);
S5 := ComplexAdd(M3, M5);;
S1 := ComplexAdd(Z[0], M1);
S2 := ComplexAdd(S1, M2);
S4 := ComplexSub(S1, M2);
Z[1] := ComplexAdd(S2, S3);
Z[2] := ComplexAdd(S4, S5);
Z[3] := ComplexSub(S4, S5);
Z[4] := ComplexSub(S2, S3);
end;
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel