arsenm wrote: > > I can understand handling the other primitive types, like short2 or a > > pointer, but I think it's unreasonable for this builtin to support all of > > these aggregates > > Without builtin support, users have to manually decompose structs into 32-bit > words and reassemble them. This is tedious and error-prone.
But that is exactly what they should do. Special aggregate handling is also an extra hazard in the compiler. General users should not be using builtins. > Worse, if they use memcpy or pointer casts, the compiler may introduce > scratch memory that it cannot optimize away. Our CodeGen avoids this . it > stores the aggregate, loads as integer, splits into 32-bit words, permutes > each, and reassembles. All integer ops, no scratch, SROA-friendly. This is > similar to how C++ lets you assign structs by value instead of requiring > memcpy. Permute is a fundamental warp operation on GPU. Making it work > transparently for arbitrary trivially-copyable types is a big usability win, > and the implementation is modest — a single loop over 32-bit words in one > self-contained function. None of this has anything to do with this builtin. This builtin should only accept trivially legal 32-bit types https://github.com/llvm/llvm-project/pull/153501 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
