On Fri, Jul 23, 2010 at 10:56:24AM -0500, richardvo...@gmail.com wrote:
> [snip]
> 
> > +      unsigned long patternl = 0;
> > +      grub_size_t i;
> > +
> > +      for (i = 0; i < sizeof (unsigned long); i++)
> > +       patternl |= ((unsigned long) pattern8) << (8 * i);
> > +
> 
> might I suggest:
> 
> unsigned long patternl = pattern8;
> patternl |= patternl << 8;
> patternl |= patternl << 16;
> patternl |= patternl << 32;
> patternl |= patternl << 64;
> 
> O(lg N) instead of O(N), no loop, no branches, and the compiler should be
> smart enough to optimize away the last two lines on systems with narrower
> long.

I no longer have the system on which I benchmarked this.  However, since
N is always either 4 or 8 on current targets, this can only amount to
micro-optimisation which I don't think can possibly matter much; we're
talking a handful of cycles at most.  Do we really need to spend time
bikeshedding this?  The important thing is taking only a cache stall per
long rather than a cache stall per byte; anything else is likely to be
noise.

-- 
Colin Watson                                       [cjwat...@ubuntu.com]

_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/grub-devel

Reply via email to