yxsamliu wrote:

> I can understand handling the other primitive types, like short2 or a 
> pointer, but I think it's unreasonable for this builtin to support all of 
> these aggregates

Without builtin support, users have to manually decompose structs into 32-bit 
words and reassemble them. This is tedious and error-prone. Worse, if they use 
memcpy or pointer casts, the compiler may introduce scratch memory that it 
cannot optimize away. Our CodeGen avoids this . it stores the aggregate, loads 
as integer, splits into 32-bit words, permutes each, and reassembles. All 
integer ops, no scratch, SROA-friendly. This is similar to how C++ lets you 
assign structs by value instead of requiring memcpy. Permute is a fundamental 
warp operation on GPU. Making it work transparently for arbitrary 
trivially-copyable types is a big usability win, and the implementation is 
modest — a single loop over 32-bit words in one self-contained function.


https://github.com/llvm/llvm-project/pull/153501
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to