I'd have to go back and review the Intel documentation to be sure, but for
this particular algorithm, an explicit memory barrier may be required on
Intel architecture, also.  If I remember correctly, without a memory
barrier, Intel arch only guarantees total memory ordering within one cache
line.  For this algorithm, we have an array of 16 cache_elements of 48
bytes each, so half of the cache_elements cross 64-byte cache lines.  When
reading the cache_element->key after the copy of the cache_element value,
we need to make sure that the cache_element value read is ordered before
the read of the cache_element->key, so one needs a memory barrier just
before the read of the cache_element->key to guarantee the ordering.

On Sat, Jan 5, 2013 at 5:08 AM, Igor Galić <i.ga...@brainsware.org> wrote:

>
> > > Sigh. I was too much focused on x86. There the compiler barrier
> > > caused
> > > by the function call is enough. But you are right, on other
> > > architectures these functions may also require cpu memory barriers.
> >
> > " on x86 … is enough" - would it harm x86 to add CPU barriers, or
> > do we want to have # define distinctions per arch?
>
> ignore me, I just realized it's going to be different
> calls per arch anyway!
>
> --
> Igor Galić
>
> Tel: +43 (0) 664 886 22 883
> Mail: i.ga...@brainsware.org
> URL: http://brainsware.org/
> GPG: 6880 4155 74BD FD7C B515  2EA5 4B1D 9E08 A097 C9AE
>
>

Reply via email to