https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #1 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #0) > Take: > > bool f(char *a) > { > char t[] = "0123456789012345678901234567890"; > return __builtin_memcmp(a, &t[0], sizeof(t)) == 0; > } > > Right now GCC uses branches to optimize this but this could be done via a > few loads followed by xor (eor) of the two sides and then oring the results > of xor > and then umavx and then comparing that to 0. This can be done for the > middle-end code too if there is a max reduction opcode. It's not worth optimizing small inline memcmp using vector instructions - the umaxv and move back to integer side adds extra latency. However the expansion could be more efficient and use the same sequence used in GLIBC memcmp: ldp data1, data3, [src1, 16] ldp data2, data4, [src2, 16] cmp data1, data2 ccmp data3, data4, 0, eq b.ne L(return2) Also the array t[] gets copied on the stack instead of just using the string literal directly.