On Thursday, 18 January 2024 08:40:48 CET Andrew Pinski wrote: > On Wed, Jan 17, 2024 at 11:28 PM Matthias Kretz <m.kr...@gsi.de> wrote: > > template <typename T> > > struct Point > > { > > T x, y, z; > > > > T distance_to_origin() { > > return sqrt(x * x + y * y + z * z); > > } > > }; > > > > Point<float> is one point in 3D space, Point<simd<float>> stores multiple > > points in 3D space and can work on them in parallel. > > > > This implies that simd<T> must have a sizeof. C++ is unlikely to get > > sizeless types (the discussions were long, there were many papers, ...). > > Should sizeless types in C++ ever happen, then composition is likely going > > to be constrained to the last data member. > > Even this is a bad design in general for simd. It means the code needs > to know the size.
Yes and no. The developer writes size-agnostic code. The person compiling the code chooses the size (via -m flags) and thus the compiler sees fixed-size code. > Also AoS vs SoA is always an interesting point here. In some cases you > want an array of structs > for speed and Point<simd<float>> does not work there at all. I guess > This is all water under the bridge with how folks design code. > You are basically pushing AoSoA idea here which is much worse idea than > before. I like to call it "array of vectorized struct" (AoVS) instead of AoSoA to emphasize the compiler-flags dependent memory layout. I've been doing a lot of heterogeneous SIMD programming since 2009, starting with an outer loop vectorization across many TUs of a high-energy physics code targeting Intel Larrabee (pre-AVX512 ISA) and SSE2 with one source. In all these years my experience has been that, if the problem allows, AoVS is best in terms of performance and code generality & readability. I'd be interested to learn why you think differently. > That being said sometimes it is not a vector of N elements you want to > work on but rather 1/2/3 vector of N elements. Seems like this is > just pushing the idea one of one vector of one type of element which > again is wrong push. I might have misunderstood. You're saying that sometimes I want a <float, 8> even though my target CPU only has <float, 4> registers? Yes! The std::experimental::simd spec and implementation isn't good enough in that area yet, but the C++26 paper(s) and my prototype implementation provides perfect SIMD + ILP translation of the expressed data-parallelism. > Also more over, I guess pushing one idea of SIMD is worse than pushing > any idea of SIMD. For Mathematical code, it is better for the compiler > to do the vectorization than the user try to be semi-portable between > different targets. I guess I agree with that statement. But I wouldn't, in general, call the use of simd<T> "the user try[ing] to be semi-portable". In my experience, working on physics code - a lot of math - using simd<T> (as intended) is better in terms of performance and performance portability. As always, abuse is possible ... > This is what was learned on Fortran but I guess > some folks in the C++ likes to expose the underlying HW instead of > thinking high level here. The C++ approach is to "leave no room for a lower-level language" while designing for high-level abstractions / usage. > > With the above as our design constraints, SVE at first seems to be a bad > > fit for implementing std::simd. However, if (at least initially) we accept > > the need for different binaries for different SVE implementations, then > > you > > can look at the "scalable" part of SVE as an efficient way of reducing the > > number of opcodes necessary for supporting all kinds of different vector > > lengths. But otherwise you can treat it as fixed-size registers - which it > > is for a given CPU. In the case of a multi-CPU shared-memory system (e.g. > > RDMA between different ARM implementations) all you need is a different > > name for incompatible types. So std::simd<float> on SVE256 must have a > > different name on SVE512. Same for std::simd<float, 8> (which is currently > > not the case with Sriniva's patch, I think, and needs to be resolved). > > For SVE that is a bad design. It means The code is not portable at all. When you say "code" you mean "source code", not binaries, right? I don't see how that follows. - Matthias -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Center for Heavy Ion Research https://gsi.de std::simd ──────────────────────────────────────────────────────────────────────────