Interestingly this is what clang also does: https://godbolt.org/z/Y4v14f9s3
> On 17/08/2022 02:21 CEST J. Gareth Moreton via fpc-devel > <fpc-devel@lists.freepascal.org> wrote: > > > Hi everyone, > > Recently I've made some optimisations centred around the SHR instruction > on x86, and there was one pair of instructions that caught my attention: > > movl (%rbx),%eax > shrl $24,%eax > > Is it permissible to optimise this to (x86 is little-endian): > > movzbl 3(%rbx),%eax? > > (You could also optimise "movl; sarl" into a "movsbl" instruction this way) > > Logically the result is the same and it removes an instruction and a > pipeline stall, but will there be a performance hit that comes from > reading an unaligned byte of memory like that? > > I did make similar optimisation once before with QWords using the > implicit zero-extension of the 32-bit MOV instruction - that is: > > movq (%rbx),%rax > shrq $32,%rax > > To: > > movl 4(%rbx),%eax > > This one is a little nicer though because it's still on a 32-bit > boundary and so was permissible. > > Gareth aka. Kit > > _______________________________________________ > fpc-devel maillist - fpc-devel@lists.freepascal.org > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel