Op 2019-10-27 om 09:02 schreef Florian Klämpfl:
I guess you're right.  It just seems weird because the System V ABI was designed from the start to use the MM registers fully, so long as the data is aligned.  In effect, it had vectorcall wrapped into its design from the start.  Granted, vectorcall has some advantages and can deal with relatively complex aggregates that the System V ABI cannot handle (for example, a record type that contains a normal vector and information relating to bump mapping).

I just hoped that making updates to uComplex, while ensuring existing Pascal code still compiles, would help take advantage of modern ABI designs.

Is there currently any example which shows that vectorcall has any advantage with FPC? Else I would propose first to make FPC able to take advantage of it and then talk about if we really add vectorcall. Currently I fear, FPC gets only into trouble when using vectorcall as it tries first to push everything into one xmm register and then splits this again in the callee.

Nils Haeck's FFT unit might be interesting. (same guy as nativejpg unit iirc, http://www.simdesign.nl)

It is a D7 language level unit that uses a complex record and simple procedures as options. It should be easy to transpose to ucomplex. It is quite hll and switchable between single and double. (I use it in single mode, but to test vectorcall, obviously double mode would be best?)

And it has routines that do a variety of complex operations.

procedure FFT_5(var Z: array of TComplex); // usage of open array is to make things generic. Could be solved differently.

  T1, T2, T3, T4, T5: TComplex;
  M1, M2, M3, M4, M5: TComplex;
  S1, S2, S3, S4, S5: TComplex;
  T1 := ComplexAdd(Z[1], Z[4]);
  T2 := ComplexAdd(Z[2], Z[3]);
  T3 := ComplexSub(Z[1], Z[4]);
  T4 := ComplexSub(Z[3], Z[2]);

  T5   := ComplexAdd(T1, T2);
  Z[0] := ComplexAdd(Z[0], T5);
  M1   := ComplexScl(c51, T5);
  M2   := ComplexScl(c52, ComplexSub(T1, T2));

  M3.Re := -c53 * (T3.Im + T4.Im);  // replace by i*add(t3,t4).scale(c53-i*c53) ?
  M3.Im :=  c53 * (T3.Re + T4.Re);
  M4.Re := -c54 * T4.Im;
  M4.Im :=  c54 * T4.Re;
  M5.Re := -c55 * T3.Im;
  M5.Im :=  c55 * T3.Re;

  S3 := ComplexSub(M3, M4);
  S5 := ComplexAdd(M3, M5);;
  S1 := ComplexAdd(Z[0], M1);
  S2 := ComplexAdd(S1, M2);
  S4 := ComplexSub(S1, M2);

  Z[1] := ComplexAdd(S2, S3);
  Z[2] := ComplexAdd(S4, S5);
  Z[3] := ComplexSub(S4, S5);
  Z[4] := ComplexSub(S2, S3);

fpc-devel maillist  -  fpc-devel@lists.freepascal.org

Reply via email to