Hi, We'd like to submit GCC support for ARM's Scalable Vector Extension (SVE). For more details about the extension itself, please see Nigel's blog post at:
https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture Francesco has also written a white paper about programming for SVE, with some worked examples: http://developer.arm.com/hpc/a-sneak-peek-into-sve-and-vla-programming In summary though, there are two main features that have a significant impact on the GCC port: (1) The length of the registers is an implementation choice and is only known at runtime. In GCC terms, things like type sizes, mode sizes, numbers of vector elements, and frame offsets can all depend on a runtime parameter. (2) The extension has been designed to enable the main vector loop to handle the epilogue as a final partial iteration, without loss of efficiency for non-epilogue iterations. In practice there isn't much overlap in the impact of these features on the compiler, so it seemed better to write them up as separate documents. I'll send those out as replies to this message. We've uploaded most of our sources to ARM/sve-branch, in case anyone wants to see at a glance what the final result of these changes looks like. The patches follow a potential submission sequence that I'll again describe in a follow-up email. Ideally we'd like to include the support in GCC 7. I realise we've only just made the Stage 1 deadline, and that it's a big change to be coming so late. However, most of the code has been in use internally for a while now and so is hopefully more mature than the late submission might suggest. We have various other changes that aren't yet in the branch. The two main ones are: (a) Support for gather loads and scatter stores. This includes support for using gathers and scatters for strided loads and stores, or for grouped loads and stores whose group size is too large for a more efficient approach. (b) Support for vectorising uncounted loops, i.e. those in which the number of iterations isn't known before the loop starts. This has two modes: (i) Use alignment to avoid partial faults in speculative loads, if alignment is reachable for all loads and if there are no other statements with side effects. (ii) Use the SVE first-faulting instruction for general speculative loads. This is the more general case and works regardless of alignment. We hope to upload these changes to the branch soon. This work was done by Alan Hayward, David Sherwood and myself. Thanks, Richard