On Wed, Jan 17, 2024 at 11:28 PM Matthias Kretz <m.kr...@gsi.de> wrote:
>
> On Thursday, 4 January 2024 10:10:12 CET Andrew Pinski wrote:
> > I really doubt this would work in the end. Because HW which is 128bits
> > only, can't support -msve-vector-bits=2048 . I am thinking
> > std::experimental::simd is not the right way of supporting this.
> > Really the route the standard should be heading towards is non
> > constant at compile time sized vectors instead and then you could use
> > the constant sized ones to emulate the Variable length ones.
>
> I don't follow. "non-constant at compile time sized vectors" implies
> sizeless (no constexpr sizeof), no?
> Let me try to explain where I'm coming from. One of the motivating use
> cases for simd types is composition. Like this:
>
> template <typename T>
> struct Point
> {
>   T x, y, z;
>
>   T distance_to_origin() {
>     return sqrt(x * x + y * y + z * z);
>   }
> };
>
> Point<float> is one point in 3D space, Point<simd<float>> stores multiple
> points in 3D space and can work on them in parallel.
>
> This implies that simd<T> must have a sizeof. C++ is unlikely to get
> sizeless types (the discussions were long, there were many papers, ...).
> Should sizeless types in C++ ever happen, then composition is likely going
> to be constrained to the last data member.

Even this is a bad design in general for simd. It means the code needs
to know the size.
Also AoS vs SoA is always an interesting point here. In some cases you
want an array of structs
for speed and Point<simd<float>> does not work there at all. I guess
This is all water under the bridge with how folks design code.
You are basically pushing AoSoA idea here which is much worse idea than before.

That being said sometimes it is not a vector of N elements you want to
work on but rather 1/2/3 vector of  N elements. Seems like this is
just pushing the idea one of one vector of one type of element which
again is wrong push.
Also more over, I guess pushing one idea of SIMD is worse than pushing
any idea of SIMD. For Mathematical code, it is better for the compiler
to do the vectorization than the user try to be semi-portable between
different targets. This is what was learned on Fortran but I guess
some folks in the C++ likes to expose the underlying HW instead of
thinking high level here.

Thanks,
Andrew Pinski

>
> With the above as our design constraints, SVE at first seems to be a bad
> fit for implementing std::simd. However, if (at least initially) we accept
> the need for different binaries for different SVE implementations, then you
> can look at the "scalable" part of SVE as an efficient way of reducing the
> number of opcodes necessary for supporting all kinds of different vector
> lengths. But otherwise you can treat it as fixed-size registers - which it
> is for a given CPU. In the case of a multi-CPU shared-memory system (e.g.
> RDMA between different ARM implementations) all you need is a different
> name for incompatible types. So std::simd<float> on SVE256 must have a
> different name on SVE512. Same for std::simd<float, 8> (which is currently
> not the case with Sriniva's patch, I think, and needs to be resolved).

For SVE that is a bad design. It means The code is not portable at all.

>
> > I think we should not depend on __ARM_FEATURE_SVE_BITS being set here
> > and being meanful in any way.
>
> I'd love to. In the same way I'd love to *not depend* on __AVX__,
> __AVX512F__ etc.
>
> - Matthias
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Center for Heavy Ion Research               https://gsi.de
>  std::simd
> ──────────────────────────────────────────────────────────────────────────

Reply via email to