yxsamliu wrote: > I can understand handling the other primitive types, like short2 or a > pointer, but I think it's unreasonable for this builtin to support all of > these aggregates
Without builtin support, users have to manually decompose structs into 32-bit words and reassemble them. This is tedious and error-prone. Worse, if they use memcpy or pointer casts, the compiler may introduce scratch memory that it cannot optimize away. Our CodeGen avoids this . it stores the aggregate, loads as integer, splits into 32-bit words, permutes each, and reassembles. All integer ops, no scratch, SROA-friendly. This is similar to how C++ lets you assign structs by value instead of requiring memcpy. Permute is a fundamental warp operation on GPU. Making it work transparently for arbitrary trivially-copyable types is a big usability win, and the implementation is modest — a single loop over 32-bit words in one self-contained function. https://github.com/llvm/llvm-project/pull/153501 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
