Hi there!
I am developing software which tries to deliberately exploit the
compiler's autovectorization facilities by feeding data in
autovectorization-friendly loops. I'm currently using both g++ and
clang++ to see how well this approach works. Using simple arithmetic, I
often get good results. To widen the scope of my work, I was looking for
documentation on which constructs would be recognized by the
autovectorization stage, and found
https://www.gnu.org/software/gcc/projects/tree-ssa/vectorization.html
By the looks of it, this document has not seen any changes for several
years. Has development on the autovectorization stage stopped, or is
there simply no documentation?
In my experience, vectorization is essential to speed up arithmetic on
the CPU, and reliable recognition of vectorization opportunities by the
compiler can provide vectorization to programs which don't bother to
code it explicitly. I feel the topic is being neglected - at least the
documentation I found suggests this. To demonstrate what I mean, I have
two concrete scenarios which I'd like to be handled by the
autovectorization stage:
- gather/scatter with arbitrary indexes
In C, this would be loops like
// gather from B to A using gather indexes
for ( int i = 0 ; i < vsz ; i++ )
A [ i ] = B [ indexes [ i ] ] ;
From the AVX2 ISA onwards, there are hardware gather/scatter
operations, which can speed things up a good deal.
- repeated use of vectorizable functions
for ( int i = 0 ; i < vsz ; i++ )
A [ i ] = sqrt ( B [ i ] ) ;
Here, replacing the repeated call of sqrt with the vectorized equivalent
gives a dramatic speedup (ca. 4X)
If the compiler were to provide the autovectorization facilities, and if
the patterns it recognizes were well-documented, users could rely on
certain code patterns being recognized and autovectorized - sort of a
contract between the user and the compiler. With a well-chosen spectrum
of patterns, this would make it unnecessary to have to rely on explicit
vectorization in many cases. My hope is that such an interface would
help vectorization to become more frequently used - as I understand the
status quo, this is still a niche topic, even though many processors
provide suitable hardware nowadays.
Can you point me to where 'the action is' in this regard?
With regards
Kay F. Jahnke