From: Richard Henderson <[EMAIL PROTECTED]>
Date: Sat, 17 Dec 2005 14:38:24 -0800

> You might consider just beginning your loops like
> 
>       mov     zero, old
>       cas     [mem], zero, old
> 
> to do the initial read, since old will now contain the 
> contents of the memory, and we havn't changed the memory.

CAS is 32 cycles minimum on sparc64 even on a cache hit, so I think
the prefetch+load will be faster :-)  But it deserves checking out,
that's for sure.

Either way, that is a clever use of CAS :)

Reply via email to