https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113537
Bug ID: 113537 Summary: ext should be used more for __builtin_shufflevector Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take: ``` #define vector4 __attribute__((vector_size(4))) #define vector8 __attribute__((vector_size(8))) #define vector16 __attribute__((vector_size(16))) vector8 char f3(vector16 char a) { return __builtin_shufflevector (a, a, 1, 2, 3, 4, 5, 6, 7, 8); } vector8 char f2(vector16 char a) { return __builtin_shufflevector (a, a, 1, 2, 3, 4, 5, 6, 7, 0); } ``` Currently GCC produces: ``` f3: adrp x0, .LC0 ldr q31, [x0, #:lo12:.LC0] tbl v0.16b, {v0.16b}, v31.16b ret f2: adrp x0, .LC1 ldr q31, [x0, #:lo12:.LC1] tbl v0.16b, {v0.16b}, v31.16b ret ``` But these should be optimized to just: ``` f3: ext v0.16b, v0.16b, v0.16b, #1 ret f2: ext v0.8b, v0.8b, v0.8b, #1 ret ```