ср, 17 дек. 2025 г. в 00:16, Georg-Johann Lay <[email protected]>: > > When a shift is performed by a shift-loop, then there are cases > where the runtime can be improved. For example, uint32_t R22 >> 5 > is currently > > ldi srcatch, 5 > 1: lsr r25 > ror r24 > ror r23 > ror r22 > dec scratch > brne 1b > > but can be done as: > > andi r22,-32 ; Set lower 5 bits to 0. > ori r22,16 ; Set bit 4 to 1. > ;; Now r22 = 0b***10000 > 1: lsr r25 > ror r24 > ror r23 > ror r22 > brcc 1b ; Carry will be 0, 0, 0, 0, 1. > > this is count-1 cycles faster where count is the shift offset. > In the example that's 4 cycles. > > Part 1 of the patch refactors the shift output function so > it gets a shift rtx_code instead of an asm template. > > Part 2 is the very optimization. > > This is for trunk and passes without new regressions. > Ok to apply?
Ok. Please apply. Denis.
