From: Al Viro <v...@ftp.linux.org.uk> On Behalf Of Al Viro > Sent: 16 April 2021 20:44 > On Fri, Apr 16, 2021 at 12:24:13PM -0700, Eric Dumazet wrote: > > From: Eric Dumazet <eduma...@google.com> > > > > We have to loop only to copy u64 values. > > After this first loop, we copy at most one u32, one u16 and one byte. > > Does it actually yield a better code? > > FWIW, this > void bar(unsigned); > void foo(unsigned n) > { > while (n >= 8) { > bar(n); > n -= 8; > } > while (n >= 4) { > bar(n); > n -= 4; > } > while (n >= 2) { > bar(n); > n -= 2; > } > while (n >= 1) { > bar(n); > n -= 1; > } > }
This variant might be better: void foo(unsigned n) { while (n >= 8) { bar(8); n -= 8; } if (likely(!n)) return; if (n & 4) bar(4); if (n & 2) bar(2); if (n & 1) bar(1); } I think Al's version might have optimised down to this, but Eric's asm contains the n -= 4/2/1; OTOH gcc can make a real pig's breakfast of code like this! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)