From: Eric Dumazet > Sent: 05 January 2016 22:19 > To: Tom Herbert > You might add a comment telling the '4' comes from length of 'adcq > 6*8(%rdi),%rax' instruction, and that the 'nop' is to compensate that > 'adcq 0*8(%rdi),%rax' is using 3 bytes instead. > > We also could use .byte 0x48, 0x13, 0x47, 0x00 to force a 4 bytes > instruction and remove the nop. > > > + lea 20f(, %rcx, 4), %r11 > + clc > + jmp *%r11 > + > +.align 8 > + adcq 6*8(%rdi),%rax > + adcq 5*8(%rdi),%rax > + adcq 4*8(%rdi),%rax > + adcq 3*8(%rdi),%rax > + adcq 2*8(%rdi),%rax > + adcq 1*8(%rdi),%rax > + adcq 0*8(%rdi),%rax // could force a 4 byte instruction (.byte > 0x48, 0x13, 0x47, 0x00) > + nop > +20: /* #quads % 8 jump table base */
Or move label at the top (after the .align) and adjust the maths. You could add a second label after the first adcq and use the difference between them for the '4'. David