Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

Matthias Kretz Thu, 18 Jan 2024 00:41:18 -0800

On Thursday, 18 January 2024 08:40:48 CET Andrew Pinski wrote:
> On Wed, Jan 17, 2024 at 11:28 PM Matthias Kretz <[email protected]> wrote:
> > template <typename T>
> > struct Point
> > {
> >   T x, y, z;
> >   
> >   T distance_to_origin() {
> >     return sqrt(x * x + y * y + z * z);
> >   }
> > };
> > 
> > Point<float> is one point in 3D space, Point<simd<float>> stores multiple
> > points in 3D space and can work on them in parallel.
> > 
> > This implies that simd<T> must have a sizeof. C++ is unlikely to get
> > sizeless types (the discussions were long, there were many papers, ...).
> > Should sizeless types in C++ ever happen, then composition is likely going
> > to be constrained to the last data member.
> 
> Even this is a bad design in general for simd. It means the code needs
> to know the size.


Yes and no. The developer writes size-agnostic code. The person compiling the 
code chooses the size (via -m flags) and thus the compiler sees fixed-size 
code.

> Also AoS vs SoA is always an interesting point here. In some cases you
> want an array of structs
> for speed and Point<simd<float>> does not work there at all. I guess
> This is all water under the bridge with how folks design code.
> You are basically pushing AoSoA idea here which is much worse idea than
> before.

I like to call it "array of vectorized struct" (AoVS) instead of AoSoA to 
emphasize the compiler-flags dependent memory layout.

I've been doing a lot of heterogeneous SIMD programming since 2009, starting 
with an outer loop vectorization across many TUs of a high-energy physics code 
targeting Intel Larrabee (pre-AVX512 ISA) and SSE2 with one source. In all 
these years my experience has been that, if the problem allows, AoVS is best 
in terms of performance and code generality & readability. I'd be interested 
to learn why you think differently.

> That being said sometimes it is not a vector of N elements you want to
> work on but rather 1/2/3 vector of  N elements. Seems like this is
> just pushing the idea one of one vector of one type of element which
> again is wrong push.

I might have misunderstood. You're saying that sometimes I want a <float, 8> 
even though my target CPU only has <float, 4> registers? Yes! The 
std::experimental::simd spec and implementation isn't good enough in that area 
yet, but the C++26 paper(s) and my prototype implementation provides perfect 
SIMD + ILP translation of the expressed data-parallelism.

> Also more over, I guess pushing one idea of SIMD is worse than pushing
> any idea of SIMD. For Mathematical code, it is better for the compiler
> to do the vectorization than the user try to be semi-portable between
> different targets.

I guess I agree with that statement. But I wouldn't, in general, call the use 
of simd<T> "the user try[ing] to be semi-portable". In my experience, working 
on physics code - a lot of math - using simd<T> (as intended) is better in 
terms of performance and performance portability. As always, abuse is possible 
...

> This is what was learned on Fortran but I guess
> some folks in the C++ likes to expose the underlying HW instead of
> thinking high level here.

The C++ approach is to "leave no room for a lower-level language" while 
designing for high-level abstractions / usage.

> > With the above as our design constraints, SVE at first seems to be a bad
> > fit for implementing std::simd. However, if (at least initially) we accept
> > the need for different binaries for different SVE implementations, then
> > you
> > can look at the "scalable" part of SVE as an efficient way of reducing the
> > number of opcodes necessary for supporting all kinds of different vector
> > lengths. But otherwise you can treat it as fixed-size registers - which it
> > is for a given CPU. In the case of a multi-CPU shared-memory system (e.g.
> > RDMA between different ARM implementations) all you need is a different
> > name for incompatible types. So std::simd<float> on SVE256 must have a
> > different name on SVE512. Same for std::simd<float, 8> (which is currently
> > not the case with Sriniva's patch, I think, and needs to be resolved).
> 
> For SVE that is a bad design. It means The code is not portable at all.

When you say "code" you mean "source code", not binaries, right? I don't see 
how that follows.

- Matthias

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research               https://gsi.de
 std::simd
──────────────────────────────────────────────────────────────────────────

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

Reply via email to