nikic wrote: On x86, what we actually end up doing is to combine those to unaligned i64 loads (see https://godbolt.org/z/P5z674x4r), which is probably the best outcome if they are supported. I assume AMDGPU does not support unaligned loads, and that's why you want to have single element loads that get inserted into a vector and then perform sub-vector extracts on it?
https://github.com/llvm/llvm-project/pull/133301 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits