[RFC] GCC port for ARM's Scalable Vector Extension

Richard Sandiford Fri, 11 Nov 2016 09:48:03 -0800

Hi,

We'd like to submit GCC support for ARM's Scalable Vector Extension (SVE).
For more details about the extension itself, please see Nigel's blog post at:

https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture

Francesco has also written a white paper about programming for SVE,
with some worked examples:

http://developer.arm.com/hpc/a-sneak-peek-into-sve-and-vla-programming

In summary though, there are two main features that have a significant
impact on the GCC port:

(1) The length of the registers is an implementation choice and is only
known at runtime. In GCC terms, things like type sizes, mode sizes,
numbers of vector elements, and frame offsets can all depend on a
runtime parameter.

(2) The extension has been designed to enable the main vector loop to
handle the epilogue as a final partial iteration, without loss of
efficiency for non-epilogue iterations.

In practice there isn't much overlap in the impact of these features on
the compiler, so it seemed better to write them up as separate documents.
I'll send those out as replies to this message.

We've uploaded most of our sources to ARM/sve-branch, in case anyone
wants to see at a glance what the final result of these changes looks like.
The patches follow a potential submission sequence that I'll again describe
in a follow-up email.

Ideally we'd like to include the support in GCC 7. I realise we've only just
made the Stage 1 deadline, and that it's a big change to be coming so late.
However, most of the code has been in use internally for a while now and
so is hopefully more mature than the late submission might suggest.

We have various other changes that aren't yet in the branch. The two
main ones are:

(a) Support for gather loads and scatter stores. This includes support
for using gathers and scatters for strided loads and stores, or for
grouped loads and stores whose group size is too large for a more
efficient approach.

(b) Support for vectorising uncounted loops, i.e. those in which the number
of iterations isn't known before the loop starts. This has two modes:

(i) Use alignment to avoid partial faults in speculative loads,
if alignment is reachable for all loads and if there are no other
statements with side effects.

(ii) Use the SVE first-faulting instruction for general speculative
loads. This is the more general case and works regardless of
alignment.

We hope to upload these changes to the branch soon.

This work was done by Alan Hayward, David Sherwood and myself.

Thanks,
Richard

[RFC] GCC port for ARM's Scalable Vector Extension

Reply via email to