When it comes to testing vectorcall, uComplex isn't the best example actually because most of the operators are inlined.  There are a number of tests under "tests/test/cg" that test vectorcall and the System V ABI using a Pascal implementation of the opaque __m128 type (the two ABIs should behave exactly the same when dealing with simple vectors).

If anything though, the example function you gave (I'll need to double-check what ComplexScl does though, if it isn't a simple multiplication) would be a pretty solid and heavy-duty test of the compiler attempting to vectorise the code - in an ideal world, individual calls to ComplexAdd and ComplexSub (which are simple + and - operations in uComplex) will compile into a single line of assembly language (ADDPD and SUBPD respectively).  Nevertheless, one could disable the inlining to see how well the compiler handles the function chaining, since with aligned data, the result from XMM0 should be easily transposed in one go to another XMM register if not just left alone as parameter data for the next function.

Gareth aka. Kit


On 29/10/2019 11:06, Marco van de Voort wrote:

Op 2019-10-27 om 09:02 schreef Florian Klämpfl:
I guess you're right.  It just seems weird because the System V ABI was designed from the start to use the MM registers fully, so long as the data is aligned.  In effect, it had vectorcall wrapped into its design from the start. Granted, vectorcall has some advantages and can deal with relatively complex aggregates that the System V ABI cannot handle (for example, a record type that contains a normal vector and information relating to bump mapping).

I just hoped that making updates to uComplex, while ensuring existing Pascal code still compiles, would help take advantage of modern ABI designs.

Is there currently any example which shows that vectorcall has any advantage with FPC? Else I would propose first to make FPC able to take advantage of it and then talk about if we really add vectorcall. Currently I fear, FPC gets only into trouble when using vectorcall as it tries first to push everything into one xmm register and then splits this again in the callee.

Nils Haeck's FFT unit might be interesting. (same guy as nativejpg unit iirc, http://www.simdesign.nl)

It is a D7 language level unit that uses a complex record and simple procedures as options. It should be easy to transpose to ucomplex. It is quite hll and switchable between single and double. (I use it in single mode, but to test vectorcall, obviously double mode would be best?)

And it has routines that do a variety of complex operations.

procedure FFT_5(var Z: array of TComplex); // usage of open array is to make things generic. Could be solved differently.

var
  T1, T2, T3, T4, T5: TComplex;
  M1, M2, M3, M4, M5: TComplex;
  S1, S2, S3, S4, S5: TComplex;
begin
  T1 := ComplexAdd(Z[1], Z[4]);
  T2 := ComplexAdd(Z[2], Z[3]);
  T3 := ComplexSub(Z[1], Z[4]);
  T4 := ComplexSub(Z[3], Z[2]);

  T5   := ComplexAdd(T1, T2);
  Z[0] := ComplexAdd(Z[0], T5);
  M1   := ComplexScl(c51, T5);
  M2   := ComplexScl(c52, ComplexSub(T1, T2));

  M3.Re := -c53 * (T3.Im + T4.Im);  // replace by i*add(t3,t4).scale(c53-i*c53) ?
  M3.Im :=  c53 * (T3.Re + T4.Re);
  M4.Re := -c54 * T4.Im;
  M4.Im :=  c54 * T4.Re;
  M5.Re := -c55 * T3.Im;
  M5.Im :=  c55 * T3.Re;

  S3 := ComplexSub(M3, M4);
  S5 := ComplexAdd(M3, M5);;
  S1 := ComplexAdd(Z[0], M1);
  S2 := ComplexAdd(S1, M2);
  S4 := ComplexSub(S1, M2);

  Z[1] := ComplexAdd(S2, S3);
  Z[2] := ComplexAdd(S4, S5);
  Z[3] := ComplexSub(S4, S5);
  Z[4] := ComplexSub(S2, S3);
end;

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to