On Fri, Jul 23, 2010 at 10:56:24AM -0500, richardvo...@gmail.com wrote: > [snip] > > > + unsigned long patternl = 0; > > + grub_size_t i; > > + > > + for (i = 0; i < sizeof (unsigned long); i++) > > + patternl |= ((unsigned long) pattern8) << (8 * i); > > + > > might I suggest: > > unsigned long patternl = pattern8; > patternl |= patternl << 8; > patternl |= patternl << 16; > patternl |= patternl << 32; > patternl |= patternl << 64; > > O(lg N) instead of O(N), no loop, no branches, and the compiler should be > smart enough to optimize away the last two lines on systems with narrower > long.
I no longer have the system on which I benchmarked this. However, since N is always either 4 or 8 on current targets, this can only amount to micro-optimisation which I don't think can possibly matter much; we're talking a handful of cycles at most. Do we really need to spend time bikeshedding this? The important thing is taking only a cache stall per long rather than a cache stall per byte; anything else is likely to be noise. -- Colin Watson [cjwat...@ubuntu.com] _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel