Re: [fpc-devel] Future development plans

Jeppe Johansen Tue, 21 Apr 2020 11:45:15 -0700

One issue with the current state of the intrinsics is that they don'treally follow the common style among other languages, and there's noagreed consensus about what path to take yet. To implement the morecommon style would be a lot of work though compared to the currentautogenerated way

Adding the AVX/AVX2 intrinsics isn't hard. I think I have done it on abranch somewhere including a bunch of fixes.


On 4/21/20 8:29 PM, J. Gareth Moreton wrote:

Hi everyone,
I hope this doesn't become a monthly podcast for me or something, butduring my bursts of motivation, inspiration and creativity, I start toplan and research things. There are a few things I'd like to developfor FPC, mostly together because there's a lot of interdependency.
* SSE/AVX intrinsics
Most of the node types for the SSE instructions have been implemented,as well as some wrapper functions that are disabled by default whiletheir format is finalised. The nodes that the compiler generateswould be useful when it comes to vectorisation, since a lot of thingslike parameters and type checks will be already handled by them. There are some gaps though. For example, AVX introduced more powerful'mask move' instructions that allow you to read as well as writepartial vectors, which would be very useful when it comes to, say,optimising algorithms that deal with 3-component vectors (very commonbecause 3-component vectors could represent 3D Cartesean coordinatesor an RGB triplet, for example).
* Vectorisation
I think this is probably the next big iteration for the compiler andoptimiser. Besides the obvious loop unrolling vectorisation, thereare a number of common algorithms that are logically easy to vectorisebut which may take some careful analysis to actually detect. One ofmy test cases is the classic dot product. In raybench.pas, a3-dimensional dot product appears as part of a function that returns avector's length - Sqrt(V.X*V.X + V.Y*V.Y + V.Z*V.Z) - under AVX, theexpression inside the square root can be optimised into a mask move(so only the first 3 components of an XMM register are loaded with thefields of V and the 4th component set to zero) and then all theadditions and multiplications are performed with a single instruction:VDPPS XMM0, XMM0, XMM0, $71 - ($71 specifically says 'only multiplyand horizontally add the first three components, and then store theresult only in the 1st component - $FF will still work since the 4thcomponent is equal to zero and only the 1st component is read for theresult, but is a little more clumsy in my opinion).
My intention, at least for these kinds of algorithms, is to make useof the new intrinstic nodes for specific SSE and AVX instructions,although there are some intrinsics missing, like the aforementionedmask move.
* Pure functions
It might be overly ambitious, but I seek to make the SSE/AVXintrinsics much easier to use (it easily becomes inefficient in C++ ifyou haven't got data alignments correct). One example I came up withis using masks in SSE/AVX instructions. If you want to call, say,x86_vmaskmovps (an intrinsic for VMASKMOVPS), you would have to set upan additional _m128 store and load in a custom-made mask (e.g. constM128Mask: _m128 = (-1.0; -1.0; -1.0; 0.0); ...x86_vmaskmovps(DestAddr, M128Data, x86_movaps(M128Mask));). Thisbecomes more problematic if you need to specifically represent$80000000 or $FFFFFFFF in one of the floating-point fields (the formeris negative zero, and the latter is one of many thousands of quiet NaNrepresentations). An example of a much a cleaner solution could bex86_vmaskmovps(DestAddr, M128Data, [True, True, True, False]);, withan explicit typecast/assignment operator that converts an array ofBooleans into a mask that could be defined and implemented somewherein the RTL. Nomally, this would be a prohibitively slow function toexecute, but if the typecast/assignment operator was defined as a purefunction, then it could be evaluated at design time and the resultant_m128 stored as an implicit constant that is loaded directly into anMM register when needed, and not having to task the programmer withfloating-point bit manipulation in order to create said constant inthe code.
* Aligned Allocation
This couples with SSE and AVX specifically, but has other uses such aswith paging, for example. Following in the footsteps of C11, I wouldlike to propose a couple of new intrinsic operations: GetMemAlignedand ReallocMemAligned, that allow you to reserve memory with analignment of your choice (with the constraint that it has to be apower of 2 and at least the size of a Pointer). Having such intrinsicswill also allow the FPC language itself to better support aligneddynamic arrays, for example.
C11's "aligned_alloc" is compatible with "free", while Microsoft's own"_aligned_malloc" is not compatible with "free" and requires its own"_aligned_free" call to properly release. Ideally I rather find asolution where GetMemAligned and ReallocMemAligned will work withFreeMem without having unpredictable effects. This would be quite anundertaking though since it would involve deep research into thememory manager and ensuring all platforms have a means with which tosupport it.
----
I haven't fully organised myself with this yet. Looking at theseproposals as a dependency graph, I feel that pure functions is thefeature that doesn't depend on everything else and I should focus myefforts here first. I'll be writing up design specifications sohopefully everyone else can understand what's going on and eitherthrow in suggestions, note where performance can be improved or plainshoot something down if it's a very bad idea.
My personal vision... I would like to see Free Pascal being relativelyeasy to use while still allowing access to powerful features likeintrinsics and having a powerful optimising compiler so games andscientific programming can greatly benefit.
What are everyone's thoughts?

Gareth aka. Kit

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Future development plans

Reply via email to