On Tue, Jul 04, 2017 at 12:50:55PM -0700, Palmer Dabbelt wrote:
> +/*
> + * FIXME: I could only find documentation that atomic_{add,sub,inc,dec} are
> + * barrier-free.  I'm assuming that and/or/xor have the same constraints as 
> the
> + * others.
> + */

Yes.. we have new documentation in the work to which I would post a link
but for some reason copy/paste stopped working again (Konsole does that
at times and is #$%#$%#4# annoying).

Ha, found it using google...

  https://marc.info/?l=linux-kernel&m=14972790112580


> +

> +/*
> + * atomic_{cmp,}xchg is required to have exactly the same ordering semantics 
> as
> + * {cmp,}xchg and the operations that return, so they need a barrier.  We 
> just
> + * use the other implementations directly.
> + */

cmpxchg triggers an extra rule; all conditional operations only need to
imply barriers on success. So a cmpxchg that fails, need not imply any
ordering what so ever.

> +/*
> + * Our atomic operations set the AQ and RL bits and therefor we don't need to
> + * fence around atomics.
> + */
> +#define __smb_mb__before_atomic()    barrier()
> +#define __smb_mb__after_atomic()     barrier()

Ah, not quite... you need full barriers here. Because your regular
atomic ops imply no ordering what so ever.

> +/*
> + * These barries are meant to prevent memory operations inside a spinlock 
> from
> + * moving outside of that spinlock.  Since we set the AQ and RL bits when
> + * entering or leaving spinlocks, no additional fence needs to be performed.
> + */
> +#define smb_mb__before_spinlock()    barrier()
> +#define smb_mb__after_spinlock()     barrier()

Also probably not true. I _think_ you want a full barrier here, but
given the total lack of documentation on your end and the fact I've not
yet read the spinlock (which I suppose is below) I cannot yet state
more.


> +
> +/* FIXME: I don't think RISC-V is allowed to perform a speculative load. */
> +#define smp_acquire__after_ctrl_dep()        barrier()

That would be a very weird thing to disallow... speculative loads are
teh awesome ;-) Note you can get the very same effect from caches when
your stores are not globally atomic.

> +/*
> + * The RISC-V ISA doesn't support byte or half-word AMOs, so we fall back to 
> a
> + * regular store and a fence here.  Otherwise we emit an AMO with an AQ or RL
> + * bit set and allow the microarchitecture to avoid the other half of the 
> AMO.
> + */
> +#define __smp_store_release(p, v)                                    \
> +do {                                                                 \
> +     union { typeof(*p) __val; char __c[1]; } __u =                  \
> +             { .__val = (__force typeof(*p)) (v) };                  \
> +     compiletime_assert_atomic_type(*p);                             \
> +     switch (sizeof(*p)) {                                           \
> +     case 1:                                                         \
> +     case 2:                                                         \
> +             smb_mb();                                               \
> +             WRITE_ONCE(*p, __u.__val);                              \
> +             break;                                                  \
> +     case 4:                                                         \
> +             __asm__ __volatile__ (                                  \
> +                     "amoswap.w.rl zero, %1, %0"                     \
> +                     : "+A" (*p), "r" (__u.__val)                    \
> +                     :                                               \
> +                     : "memory");                                    \
> +             break;                                                  \
> +     case 8:                                                         \
> +             __asm__ __volatile__ (                                  \
> +                     "amoswap.d.rl zero, %1, %0"                     \
> +                     : "+A" (*p), "r" (__u.__val)                    \
> +                     :                                               \
> +                     : "memory");                                    \
> +             break;                                                  \
> +     }                                                               \
> +} while (0)
> +
> +#define __smp_load_acquire(p)                                                
> \
> +do {                                                                 \
> +     union { typeof(*p) __val; char __c[1]; } __u =                  \
> +             { .__val = (__force typeof(*p)) (v) };                  \
> +     compiletime_assert_atomic_type(*p);                             \
> +     switch (sizeof(*p)) {                                           \
> +     case 1:                                                         \
> +     case 2:                                                         \
> +             __u.__val = READ_ONCE(*p);                              \
> +             smb_mb();                                               \
> +             break;                                                  \
> +     case 4:                                                         \
> +             __asm__ __volatile__ (                                  \
> +                     "amoor.w.aq %1, zero, %0"                       \
> +                     : "+A" (*p)                                     \
> +                     : "=r" (__u.__val)                              \
> +                     : "memory");                                    \
> +             break;                                                  \
> +     case 8:                                                         \
> +             __asm__ __volatile__ (                                  \
> +                     "amoor.d.aq %1, zero, %0"                       \
> +                     : "+A" (*p)                                     \
> +                     : "=r" (__u.__val)                              \
> +                     : "memory");                                    \
> +             break;                                                  \
> +     }                                                               \
> +     __u.__val;                                                      \
> +} while (0)

'creative' use of amoswap and amoor :-)

You should really look at a normal load with ordering instruction
though, that amoor.aq is a rmw and will promote the cacheline to
exclusive (and dirty it).

> +/*
> + * Simple spin lock operations.  These provide no fairness guarantees.
> + */
> +
> +/* FIXME: Replace this with a ticket lock, like MIPS. */
> +
> +#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
> +#define arch_spin_is_locked(x)       ((x)->lock != 0)
> +#define arch_spin_unlock_wait(x) \
> +             do { cpu_relax(); } while ((x)->lock)
> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> +     __asm__ __volatile__ (
> +             "amoswap.w.rl x0, x0, %0"
> +             : "=A" (lock->lock)
> +             :: "memory");
> +}
> +
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> +     int tmp = 1, busy;
> +
> +     __asm__ __volatile__ (
> +             "amoswap.w.aq %0, %2, %1"
> +             : "=r" (busy), "+A" (lock->lock)
> +             : "r" (tmp)
> +             : "memory");
> +
> +     return !busy;
> +}
> +
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> +     while (1) {
> +             if (arch_spin_is_locked(lock))
> +                     continue;
> +
> +             if (arch_spin_trylock(lock))
> +                     break;
> +     }
> +}

OK, so back to smp_mb__{before,after}_spinlock(), that wants to order
things like:

        wakeup:                                 block:

        COND = 1;                               p->state = UNINTERRUPTIBLE;
                                                smp_mb();
        smp_mb__before_spinlock();
        spin_lock(&lock);                       if (!COND)
                                                  schedule()
        if (p->state & state)
                goto out;


And here it is important that the COND store not happen _after_ the
p->state load.

Now, your spin_lock() only implies the AQ thing, which should only
constraint later load/stores but does nothing for the prior load/stores.
So our COND store can drop into the lock and even happen after the
p->state load.

So you very much want your smp_mb__{before,after}_spinlock thingies to
be full barriers.

Reply via email to