Jonathan Wright <jonathan.wri...@arm.com> writes: > Hi, > > As subject, this patch uses a union instead of constructing a new opaque > vector structure for each of the vqtbl[234] Neon intrinsics in arm_neon.h. > This simplifies the header file and also improves code generation - > superfluous move instructions were emitted for every register > extraction/set in this additional structure. > > This change is safe because the C-level vector structure types e.g. > uint8x16x4_t already provide a tie for sequential register allocation > - which is required by the TBL instructions. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master?
Looks good, but I think we should have some tests to defend the RA improvements. E.g. have things like: #include <arm_neon.h> … uint8x8_t f2_u8 (uint8x16x2_t x, uint8x8_t y) { return vqtbl2_u8 (x, y); } … and add a scan-assembler-not for moves. Union punning is UB for standard C++, but I think in practice we're not going to be able to treat it as such for GCC. This would be far from the only thing to rely on union punning for correctness. Thanks, Richard