https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82260
--- Comment #5 from Peter Cordes <peter at cordes dot ca> --- > (not (match_test "TARGET_PARTIAL_REG_STALL")))))) gcc is doing this even with -mtune=core2. Core2 / Nehalem stall (the front-end) for 2-3 cycles to insert a merging uop when reading a full register after writing a partial register. Sandybridge inserts a merging uop without stalling. Haswell/Skylake doesn't rename low8 in the first place (but inserts a merging uop for high8 without stalling). gcc should be trying to avoid partial-register shenanigans on Core2 / Nehalem, but the penalty is low enough that it's probably not worth changing -mtune=generic. Related: gcc likes to do set-flags / setcc / movzx, but it would be significantly better to do xor-zero / set-flags / setcc when possible, when a zero-extended bool is needed. setcc into the low8 of a register zeroed with a recognized zeroing idiom avoids partial-register penalties when reading the full register, and it has a shorter critical path from test -> 32-bit result. It also avoids a false dependency on the old value of the register. (Fun fact: on early P6 (PPro to Pentium III), xor-zeroing was not dependency-breaking, but did avoid partial-register stalls.) Also, movzx %al, %eax defeats mov-elimination on Intel, so it's always better to movzx to a different architectural register for zero-extension, modulo register pressure and not costing any extra instructions total. Is there already an open bug for either of these latter problems? (Sorry I have a bad habit of taking bugs off topic.)