https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109907
--- Comment #11 from Georg-Johann Lay <gjl at gcc dot gnu.org> --- I tried with the test case, but the expensive shifts are still there except for the cset_32bit30_not case, which improved as noted above. cset_32bit30 however goes from the 3-instruction code to: cset_32bit30: movw r26,r24 ; 19 [c=4 l=2] *movsi/0 movw r24,r22 ldi r18,30 ; 26 [c=44 l=7] *lshrsi3_const/3 1: lsr r27 ror r26 ror r25 ror r24 dec r18 brne 1b andi r24,lo8(1) ; 21 [c=4 l=1] *andqi3/1 ret ; 24 [c=0 l=1] return So we are back where we started? All except one case use expensive shift.