From: Richard Henderson <[EMAIL PROTECTED]> Date: Sat, 17 Dec 2005 14:38:24 -0800
> You might consider just beginning your loops like > > mov zero, old > cas [mem], zero, old > > to do the initial read, since old will now contain the > contents of the memory, and we havn't changed the memory. CAS is 32 cycles minimum on sparc64 even on a cache hit, so I think the prefetch+load will be faster :-) But it deserves checking out, that's for sure. Either way, that is a clever use of CAS :)
