Re: [OMPI devel] [OMPI svn] svn:open-mpi r13644

Brian W. Barrett Tue, 13 Feb 2007 22:37:59 -0500

On Feb 13, 2007, at 7:37 PM, Brian W. Barrett wrote:

On Feb 13, 2007, at 5:16 PM, Jeff Squyres wrote:

On Feb 13, 2007, at 7:10 PM, George Bosilca wrote:

It's already in the 1.2 !!! I don't know much you care about
performance, but I do. This patch increase by 10% the latency. It
might be correct for the pathscale compiler, but it didn't look as a
huge requirement for all others compilers. A memory barrier for an
initialization and an unlock definitively looks like killing an ant
with a nuclear strike.


Can we roll this back and find some other way?


Yes, we can.

It's not actually the memory barrier we need, it's the tell the
compiler to not do anything stupid because we expect memory to be
invalidated that we need.  I'll commit a new, different fix tonight.

Upon further review, I'm wrong again. The original patch was wrong(not sure what I was thinking this afternoon) and my statement aboveis wrong. So the problem starts with the code:


a = 1
mylock->lock = 0
b = 2

Which is essentially what you have after inlining the atomic unlockas it occurred today. It's not totally unreasonable for a compiler(and we have seen this in practice, including with GCC on LA-MPI andlikely are having it happen now, just don't realize it) to reorderthat to:


a = 1
b = 2
mylock->lock = 0

or

mylock->lock = 0
a = 1
b = 2

After all, there's no memory dependencies in the three lines ofcode. When we had the compare and swap for unlock, there was amemory dependency, because the compare and swap inline assemblyhinted to the compiler that memory was changed by the op and itshouldn't reorder memory accesses across that boundary or the compareand swap wasn't inlined. Compilers are pretty much not going toreorder memory accesses across a function unless it's 100% clear thatthere is not a side effect that might be important, which isbasically never in C.

Ok, so we can tell the compiler not to reorder memory access with alittle care (either compiler hints using inline assembly statementsthat include the "memory" invalidation hint) or by makingatomic_unlock a function.

But now we start running on hardware, and the memory controller isfree to start reordering code. We don't have any instructionstelling the CPU / memory controller not to reorder our originalinstructions, so it can still do either one of the two bad cases.Still not good for us and definitely could lead to incorrectprograms. So we need a memory barrier or we have potentially invalidcode.

The full memory barrier is totally overkill for this situation, butsome memory barrier is needed. While not quite correct, I believethat something like;


static inline void
opal_atomic_unlock(opal_atomic_lock_t *lock)
{
  opal_atomic_wmb();
  lock->u.lock=OPAL_ATOMIC_UNLOCKED;
}

would be more correct than having the barrier after the write andslightly better performance than the full atomic barrier. On x86 andx86_64, memory barriers are "free", in that all they do is limit thecompiler's reordering of memory access. But on PPC, Sparc, andAlpha, it would have a performance cost. Don't know what that costis, but I know that we need to pay it for correctness.

Long term, we should probably try to implement spinlocks as inlineassembly. This wouldn't provide a whole lot of performancedifference, but at least I could make sure the memory barrier is inthe right place and help the compiler not be stupid.

By the way, this is what the Linux kernel does, adding credence to myargument, I hope ;).



Brian

--
  Brian Barrett
  Open MPI Team, CCS-1
  Los Alamos National Laboratory

Re: [OMPI devel] [OMPI svn] svn:open-mpi r13644

Reply via email to