On Sun, Jun 13, 2010 at 05:28:28PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > Use a back-to-back pair of pthread_mutex_lock acquisitions and releases
> > to stand in for a memory barrier, given that ARM doesn't seem to do
> > the right thing with __sync_synchronize().  Tested on ARM and on x86.
> > The x86 testing was performed by removing all of the x86-specific
> > tests from configure.ac and then building from scratch.
> > 
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > ---
> >  configure.ac                |    9 +---
> >  urcu/arch_unknown.h         |   87 
> > +++++++++++++++++++++++++++++++++++++++++++
> >  urcu/uatomic_arch_unknown.h |   57 ++++++++++++++++++++++++++++
> >  3 files changed, 146 insertions(+), 7 deletions(-)
> >  create mode 100644 urcu/arch_unknown.h
> >  create mode 100644 urcu/uatomic_arch_unknown.h
> > 
> > diff --git a/configure.ac b/configure.ac
> > index 1b1ca65..9274337 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -54,13 +54,8 @@ case $host_cpu in
> >     *) ARCHTYPE="unknown";;
> >  esac
> >  
> > -if test "$ARCHTYPE" != "unknown"; then
> > -   UATOMICSRC=urcu/uatomic_arch_$ARCHTYPE.h
> > -   ARCHSRC=urcu/arch_$ARCHTYPE.h
> > -else
> > -   UATOMICSRC=urcu/uatomic_generic.h
> > -   ARCHSRC=urcu/arch_generic.h
> > -fi
> > +UATOMICSRC=urcu/uatomic_arch_$ARCHTYPE.h
> > +ARCHSRC=urcu/arch_$ARCHTYPE.h
> >  if test "x$ARCHTYPE" != xx86 -a "x$ARCHTYPE" != xppc; then
> >     APISRC=tests/api_gcc.h
> >  else
> > diff --git a/urcu/arch_unknown.h b/urcu/arch_unknown.h
> > new file mode 100644
> > index 0000000..3f78310
> > --- /dev/null
> > +++ b/urcu/arch_unknown.h
> > @@ -0,0 +1,87 @@
> > +#ifndef _URCU_ARCH_ARMV7_H
> > +#define _URCU_ARCH_ARMV7_H
> > +
> > +/*
> > + * arch_armv7.h: trivial definitions for the ARMv7-A/R architecture.
> > + *
> > + * Copyright (c) 2010 Paul E. McKenney, IBM Corporation.
> > + * Copyright (c) 2009 Mathieu Desnoyers <[email protected]>
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > +*
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
> > 02110-1301 USA
> > + */
> > +
> > +#include <urcu/compiler.h>
> > +#include <urcu/config.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif 
> > +
> > +/* We don't know, so guess!!! */
> > +#define CACHE_LINE_SIZE    64
> > +
> > +#define mb()    \
> > +   do { \
> > +           pthread_mutex_t __mb_fake_mutex; \
> 
> Can you give a hint telling why this double mutex acquire-release produces the
> equivalent of a memory barrier ? Some comments would be welcome in the code.
> Commenting that this mutex is declared on the local function stack would be
> appropriate too (I had my mind in a twist at first thinking it was somehow
> static).

The compiler (and sometimes the CPU) are within their rights to move
code into a lock-based critical section.  So if we do only one lock
acquisition:

        do_first_thing();
        acquire_lock();
        release_lock();
        do_second_thing();

could legally be transformed to:

        acquire_lock();
        do_second_thing();
        do_first_thing();
        release_lock();

which certainly isn't acting like a memory barrier.  But with two
locks:

        do_first_thing();
        acquire_lock();
        release_lock();
        acquire_lock();
        release_lock();
        do_second_thing();

the worst that the compiler and CPU can do is:

        acquire_lock();
        do_first_thing();
        release_lock();
        acquire_lock();
        do_second_thing();
        release_lock();

which is still acting like a memory barrier.

> BTW, I don't think it is required to keep the next patch separate. Folding it
> into this patch will probably make the following discussion easier.

Fair enough!  But see later comments.

                                                        Thanx, Paul

> Thanks!
> 
> Mathieu
> 
> > +           \
> > +           pthread_mutex_init(&__mb_fake_mutex, NULL); \
> > +           if (pthread_mutex_lock(&__mb_fake_mutex) != 0) { \
> > +                   perror("pthread_mutex_lock"); \
> > +                   abort(); \
> > +           } \
> > +           if (pthread_mutex_unlock(&__mb_fake_mutex) != 0) { \
> > +                   perror("pthread_mutex_lock"); \
> > +                   abort(); \
> > +           } \
> > +           if (pthread_mutex_lock(&__mb_fake_mutex) != 0) { \
> > +                   perror("pthread_mutex_lock"); \
> > +                   abort(); \
> > +           } \
> > +           if (pthread_mutex_unlock(&__mb_fake_mutex) != 0) { \
> > +                   perror("pthread_mutex_lock"); \
> > +                   abort(); \
> > +           } \
> > +   } while (0)
> > +
> > +/*
> > + * Serialize core instruction execution. Also acts as a compiler barrier.
> > + */
> > +#define sync_core()        asm volatile("isb" : : : "memory")
> > +
> > +#include <stdlib.h>
> > +#include <sys/time.h>
> > +
> > +typedef unsigned long long cycles_t;
> > +
> > +static inline cycles_t get_cycles (void)
> > +{
> > +   long long thetime;
> > +   struct timeval tv;
> > +
> > +   if (gettimeofday(&tv, NULL) != 0) {
> > +           perror("gettimeofday");
> > +           abort();
> > +   }
> > +   thetime = ((long long)tv.tv_sec) * 1000000ULL + ((long long)tv.tv_usec);
> > +   return (cycles_t)thetime;
> > +}
> > +
> > +#ifdef __cplusplus 
> > +}
> > +#endif
> > +
> > +#include <urcu/arch_generic.h>
> > +
> > +#endif /* _URCU_ARCH_ARMV7_H */
> > diff --git a/urcu/uatomic_arch_unknown.h b/urcu/uatomic_arch_unknown.h
> > new file mode 100644
> > index 0000000..d1cf93e
> > --- /dev/null
> > +++ b/urcu/uatomic_arch_unknown.h
> > @@ -0,0 +1,57 @@
> > +#ifndef _URCU_ARCH_UATOMIC_ARMV7_H
> > +#define _URCU_ARCH_UATOMIC_ARMV7_H
> > +
> > +/* 
> > + * Copyright (c) 1991-1994 by Xerox Corporation.  All rights reserved.
> > + * Copyright (c) 1996-1999 by Silicon Graphics.  All rights reserved.
> > + * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
> > + * Copyright (c) 2009      Mathieu Desnoyers
> > + * Copyright (c) 2010      Paul E. McKenney, IBM Corporation
> > + *                    (Adapted from uatomic_arch_ppc.h)
> > + *
> > + * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
> > + * OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
> > + *
> > + * Permission is hereby granted to use or copy this program
> > + * for any purpose,  provided the above notices are retained on all copies.
> > + * Permission to modify the code and to distribute modified code is 
> > granted,
> > + * provided the above notices are retained, and a notice that the code was
> > + * modified is included with the above copyright notice.
> > + *
> > + * Code inspired from libuatomic_ops-1.2, inherited in part from the
> > + * Boehm-Demers-Weiser conservative garbage collector.
> > + */
> > +
> > +#include <urcu/compiler.h>
> > +#include <urcu/system.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif 
> > +
> > +#define ILLEGAL_INSTR      ".long  0xd00d00" /* @@@FIXME from ppc to ARM. 
> > */
> > +
> > +/*
> > + * Using a isync as second barrier for exchange to provide acquire 
> > semantic.
> > + * According to uatomic_ops/sysdeps/gcc/powerpc.h, the documentation is 
> > "fairly
> > + * explicit that this also has acquire semantics."
> > + * Derived from AO_compare_and_swap(), but removed the comparison.
> > + */
> > +
> > +/* xchg */
> > +#define uatomic_xchg(addr, v)       __sync_lock_test_and_set(addr, v);
> > +
> > +/* cmpxchg */
> > +#define uatomic_cmpxchg(addr, old, _new) \
> > +   __sync_val_compare_and_swap(addr, old, _new)
> > +
> > +/* uatomic_add_return */
> > +#define uatomic_add_return(addr, v)  __sync_add_and_fetch(addr, v)
> > +
> > +#ifdef __cplusplus 
> > +}
> > +#endif
> > +
> > +#include <urcu/uatomic_generic.h>
> > +
> > +#endif /* _URCU_ARCH_UATOMIC_ARMV7_H */
> > -- 
> > 1.7.0.6
> > 
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to