Hi everyone,

Just thought I'd give a heads-up on my latest mad experiments!

I'm currently working to see if I can improve auto-vectorisation within the compiler.  I'm using x86_64 as my starting point since SSE2 is guaranteed to be present, but aiming to make it as cross-platform as possible so it can be ported to AArch64 and the like.

Current status:

 * Currently it compiles and works but it generally performs worse than
   without auto-vectorisation because the compiler forces everything
   into memory (its usual fall-back when it doesn't quite know what to
   do wih a storage type).
 * I'm using the ucomplex.pp unit as my test case while also telling it
   to use 'vectorcall' in all of the routines. Because the complex type
   is just two Doubles, this is perfectly suited for XMM.
 * I tried to reuse LOC_SUBSETREG and LOC_CSUBSETREG for locations that
   occupied specific lanes of an MM register, but this caused problems
   since the type is designed only for integer registers, so I have
   created a new LOC_MMLANE and LOC_CMMLANE type and associated
   structure within the TLocation union, which are specifically
   designed for MM registers (and so doesn't have to handle
   bitpacking).  This also allows me to write new methods like
   a_loadmm_reg_lane instead of re-using and over-complicating existing
   ones.  (I also made sure to follow the convention of keeping
   LOC_REFERENCE and LOC_CREFERENCE last).
 * Currently I'm only supporting 128-bit MM types.  256-bit and above
   will come at a later date.
 * Currently auto-vectorisation is always attempted, but later will
   disable it if it's not -O2 or -O3 (haven't decided which yet).

Kit


--
This email has been checked for viruses by Avast antivirus software.
www.avast.com
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to