The idea I had currently (this is without looking at any previous theory) was to use a kind of sliding window, similar to how ZIP and other LZ77-based algorithms work when compressing repeating strings, to look backwards in the current block for a matching command and then scan forward. If the scan gets up to the instruction right before the starting point, then it's potential for vectorisable code. Using the previous example:
movss 16(%rsp),%xmm0 addss 32(%rsp),%xmm0 movss %xmm0,(%rax) movss 20(%rsp),%xmm0 addss 36(%rsp),%xmm0 movss %xmm0,4(%rax) Starting at the 4th command, it looks back to find a match in the 1st command, albeit with Ann address that differs only by 4. As it scans forward, it finds similar matches in subsequent commands, and eventually realises the entire block could potentially be vectorised. If it continues, it finds the code fragment repeats 4 times and can be vectorised with little difficulty. Being only SSE commands helps too. Kit P.S. I did look at the loop unrolling code, but it almost never triggers due to the small instruction cache that's assumed. For x86-64, is it safe to assume a cache length of 60 instead of 30, since almost all modern Intel and AMD processors have 56+ elements in their queues. On Sun 10/12/17 13:50 , "Florian Klämpfl" flor...@freepascal.org sent: > Am 10.12.2017 um 02:29 schrieb J. Gareth Moreton: > > > Hi everyone, > > > > > > Since I'm masochistic in my desire to understand > and improve the Free Pascal Compiler, I would like to add > > some vectorisation support in its optimisation > cycle, since that is one thing that many other compilers > > attempt to do these days. But before I begin, > does FPC support any kind of vectorisation already? If it > > does I haven't been able to find it yet, and I > don't want to end up reinventing the wheel. > > > I started once to work on this, but never merged it into fpc trunk, it > might be even only in my > local git check out, I can look for it. > > > > > > > > I'm sure it's a mammoth task, but I would like > to start somewhere with it - however, are there any design > > plans that I should be adhering to so I don't > end up designing something that is disliked? > > > > > > Well, basically it means that another pass (like e.g. unroll_loop in > optloop.pas) of the tree must > be added which generated operations as they can be encoded by -Sv. To do > this efficiently, probably > some previous simplification of the tree is needed. But this is something > for later. > __________________________________________ _____ > > fpc-devel maillist - fpc- de...@lists.freepascal.org > http://lists.freepascal.org/cgi- bin/mailman/listinfo/fpc-devel > > > > _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel