Coding would be much simpler if one could use GCC __atomic builtins directly instead of ODP public and internal atomic API's. The problem with odp_atomic_u64_t sometimes (i.e. for most 32-bit architectures, ARMv7a and x86-32 the exceptions) containing a lock would go away.
-- Ola On 17 May 2016 at 09:00, Ola Liljedahl <[email protected]> wrote: > > > On 17 May 2016 at 00:45, Bill Fischofer <[email protected]> wrote: > >> >> >> On Mon, May 16, 2016 at 4:55 PM, Ola Liljedahl <[email protected]> >> wrote: >> >>> On 16 May 2016 at 23:31, Bill Fischofer <[email protected]> >>> wrote: >>> >>>> Thanks Ola. The original bug is that this fails compiling with clang >>>> on 32-bit systems and the reason is that the struct in that environment >>>> winds up being 24 bytes rather than 16 bytes long, so the >>>> ODP_STATIC_ASSERT() fails. >>>> >>> The timer code should stop using odp_atomic_u64_t (which will include >>> the spinlock for systems that do not support atomic operations on 64-bit >>> locations, ARMv7A is one of the few 32-bit platforms which do support >>> 64-bit atomics natively). Instead use a plain 64-bit datatype (uint64_t) on >>> such systems. For systems which do not support 128-bit (or 64-bit?) atomic >>> operations (which is the ideal but I think I managed to get something >>> working with only 64-bit atomics), we should use locks instead. I think the >>> code does use a separate array of locks which is indexed through a hash of >>> the timer handle or something similar. This locks makes the additional lock >>> in odp_atomic_u64_t unused. >>> >>> Which platforms use the 64-bit atomic code in the timer? >>> ARMv7A (32-bit but with 64-bit atomics support - ldrexd/strexd) >>> OCTEON/MIPS64 (with 64-bit atomics support - lld/scd) >>> 64-bit POWER? (with 64--bit atomics support - ldarx/stdcx.) >>> >>> Se we need a couple of things: >>> 1) 128-bit CAS support on e.g. x86 (requires that configure detects and >>> enables -mcx16 flag). >>> 2) 128-bit ldxp/stxp support on ARM64 (A64 ISA in AArch64 mode). I don't >>> think common compilers support 128-bit CAS on ARM64 (it's a bit tricky) so >>> need to use inline assembly. >>> 3) Some cleanup in the timer code not to use odp_atomic_u64_t on systems >>> which do not support 64-bit atomic operations (and instead uses the spin >>> lock array). This will avoid the failed static assert. >>> >> >> Unless you'd like to submit that patch this week, we need to do something >> on at least a temporary basis to close out Monarch RC3. Sounds like this is >> an area we should revisit for restructure/cleanup for Tiger Moth. Given the >> divergence between near-term need and the (better) longer-term solution you >> outline, what do you recommend for now? >> > Fix the static assert problem on certain 32-bit targets by changing > odp_atomic_u64_t to uint64_t on those targets. > > >> >> >>> >>> >>>> My original (simple) patch is at >>>> http://patches.opendataplane.org/patch/5932/ however Maxim didn't like >>>> it. Perhaps you can offer a compromise? >>>> >>>> On Mon, May 16, 2016 at 4:21 PM, Ola Liljedahl < >>>> [email protected]> wrote: >>>> >>>>> I disagree. The 128-bit support is important because that's the >>>>> lock-free timer implementation which was the whole idea. The lock-based >>>>> code is just a work-around. We should rather add support in configure to >>>>> detect compiler support for the -mcx16 flag which enables 128-bit CAS on >>>>> x86-64. Now when I have an ARM64 target, I can actually write and test >>>>> 128-bit CAS support on ARM (A64 ISA in AArch64 mode) and contribute that >>>>> in >>>>> another patch. I am not sure whether common compilers (e.g. GCC) properly >>>>> supports 128-bit CAS on ARM64, might have to use LDXP/STXP (load/store >>>>> exclusive pair) directly. >>>>> >>>>> -- Ola >>>>> >>>>> On 16 May 2016 at 22:43, Bill Fischofer <[email protected]> >>>>> wrote: >>>>> >>>>>> On Mon, May 16, 2016 at 2:15 PM, Maxim Uvarov < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>> > Remove totally untested branch with 128 bit optimizations >>>>>> > for timer code. >>>>>> > This also fixes static assert bug: >>>>>> > https://bugs.linaro.org/show_bug.cgi?id=2211 >>>>>> > >>>>>> > Signed-off-by: Maxim Uvarov <[email protected]> >>>>>> > >>>>>> >>>>>> Reviewed-and-tested-by: Bill Fischofer <[email protected]> >>>>>> >>>>>> >>>>>> > --- >>>>>> > platform/linux-generic/odp_timer.c | 107 >>>>>> > ------------------------------------- >>>>>> > 1 file changed, 107 deletions(-) >>>>>> > >>>>>> > diff --git a/platform/linux-generic/odp_timer.c >>>>>> > b/platform/linux-generic/odp_timer.c >>>>>> > index 6b84309..be28da3 100644 >>>>>> > --- a/platform/linux-generic/odp_timer.c >>>>>> > +++ b/platform/linux-generic/odp_timer.c >>>>>> > @@ -11,15 +11,7 @@ >>>>>> > * >>>>>> > */ >>>>>> > >>>>>> > -/* Check if compiler supports 16-byte atomics. GCC needs -mcx16 >>>>>> flag on >>>>>> > x86 */ >>>>>> > -/* Using spin lock actually seems faster on Core2 */ >>>>>> > -#ifdef ODP_ATOMIC_U128 >>>>>> > -/* TB_NEEDS_PAD defined if sizeof(odp_buffer_t) != 8 */ >>>>>> > -#define TB_NEEDS_PAD >>>>>> > -#define TB_SET_PAD(x) ((x).pad = 0) >>>>>> > -#else >>>>>> > #define TB_SET_PAD(x) (void)(x) >>>>>> > -#endif >>>>>> > >>>>>> > #include <odp_posix_extensions.h> >>>>>> > >>>>>> > @@ -70,11 +62,9 @@ >>>>>> > * Mutual exclusion in the absence of CAS16 >>>>>> > >>>>>> > >>>>>> *****************************************************************************/ >>>>>> > >>>>>> > -#ifndef ODP_ATOMIC_U128 >>>>>> > #define NUM_LOCKS 1024 >>>>>> > static _odp_atomic_flag_t locks[NUM_LOCKS]; /* Multiple locks per >>>>>> cache >>>>>> > line! */ >>>>>> > #define IDX2LOCK(idx) (&locks[(idx) % NUM_LOCKS]) >>>>>> > -#endif >>>>>> > >>>>>> > >>>>>> > >>>>>> /****************************************************************************** >>>>>> > * Translation between timeout buffer and timeout header >>>>>> > @@ -98,17 +88,9 @@ static odp_timeout_hdr_t >>>>>> *timeout_hdr(odp_timeout_t tmo) >>>>>> > typedef struct tick_buf_s { >>>>>> > odp_atomic_u64_t exp_tck;/* Expiration tick or TMO_xxx */ >>>>>> > odp_buffer_t tmo_buf;/* ODP_BUFFER_INVALID if timer not >>>>>> active */ >>>>>> > -#ifdef TB_NEEDS_PAD >>>>>> > - uint32_t pad;/* Need to be able to access padding for >>>>>> successful >>>>>> > CAS */ >>>>>> > -#endif >>>>>> > } tick_buf_t >>>>>> > -#ifdef ODP_ATOMIC_U128 >>>>>> > -ODP_ALIGNED(16) /* 16-byte atomic operations need properly aligned >>>>>> > addresses */ >>>>>> > -#endif >>>>>> > ; >>>>>> > >>>>>> > -ODP_STATIC_ASSERT(sizeof(tick_buf_t) == 16, "sizeof(tick_buf_t) == >>>>>> 16"); >>>>>> > - >>>>>> > typedef struct odp_timer_s { >>>>>> > void *user_ptr; >>>>>> > odp_queue_t queue;/* Used for free list when timer is free >>>>>> */ >>>>>> > @@ -378,34 +360,6 @@ static bool timer_reset(uint32_t idx, >>>>>> > tick_buf_t *tb = &tp->tick_buf[idx]; >>>>>> > >>>>>> > if (tmo_buf == NULL || *tmo_buf == ODP_BUFFER_INVALID) { >>>>>> > -#ifdef ODP_ATOMIC_U128 >>>>>> > - tick_buf_t new, old; >>>>>> > - do { >>>>>> > - /* Relaxed and non-atomic read of current >>>>>> values */ >>>>>> > - old.exp_tck.v = tb->exp_tck.v; >>>>>> > - old.tmo_buf = tb->tmo_buf; >>>>>> > - TB_SET_PAD(old); >>>>>> > - /* Check if there actually is a timeout >>>>>> buffer >>>>>> > - * present */ >>>>>> > - if (old.tmo_buf == ODP_BUFFER_INVALID) { >>>>>> > - /* Cannot reset a timer with >>>>>> neither old >>>>>> > nor >>>>>> > - * new timeout buffer */ >>>>>> > - success = false; >>>>>> > - break; >>>>>> > - } >>>>>> > - /* Set up new values */ >>>>>> > - new.exp_tck.v = abs_tck; >>>>>> > - new.tmo_buf = old.tmo_buf; >>>>>> > - TB_SET_PAD(new); >>>>>> > - /* Atomic CAS will fail if we experienced >>>>>> torn >>>>>> > reads, >>>>>> > - * retry update sequence until CAS succeeds >>>>>> */ >>>>>> > - } while (!_odp_atomic_u128_cmp_xchg_mm( >>>>>> > - (_odp_atomic_u128_t *)tb, >>>>>> > - (_uint128_t *)&old, >>>>>> > - (_uint128_t *)&new, >>>>>> > - _ODP_MEMMODEL_RLS, >>>>>> > - _ODP_MEMMODEL_RLX)); >>>>>> > -#else >>>>>> > #ifdef __ARM_ARCH >>>>>> > /* Since barriers are not good for C-A15, we take an >>>>>> > * alternative approach using relaxed memory model >>>>>> */ >>>>>> > @@ -453,7 +407,6 @@ static bool timer_reset(uint32_t idx, >>>>>> > /* Release the lock */ >>>>>> > _odp_atomic_flag_clear(IDX2LOCK(idx)); >>>>>> > #endif >>>>>> > -#endif >>>>>> > } else { >>>>>> > /* We have a new timeout buffer which replaces any >>>>>> old one >>>>>> > */ >>>>>> > /* Fill in some (constant) header fields for timeout >>>>>> > events */ >>>>>> > @@ -468,19 +421,6 @@ static bool timer_reset(uint32_t idx, >>>>>> > } >>>>>> > /* Else ignore buffers of other types */ >>>>>> > odp_buffer_t old_buf = ODP_BUFFER_INVALID; >>>>>> > -#ifdef ODP_ATOMIC_U128 >>>>>> > - tick_buf_t new, old; >>>>>> > - new.exp_tck.v = abs_tck; >>>>>> > - new.tmo_buf = *tmo_buf; >>>>>> > - TB_SET_PAD(new); >>>>>> > - /* We are releasing the new timeout buffer to some >>>>>> other >>>>>> > - * thread */ >>>>>> > - _odp_atomic_u128_xchg_mm((_odp_atomic_u128_t *)tb, >>>>>> > - (_uint128_t *)&new, >>>>>> > - (_uint128_t *)&old, >>>>>> > - _ODP_MEMMODEL_ACQ_RLS); >>>>>> > - old_buf = old.tmo_buf; >>>>>> > -#else >>>>>> > /* Take a related lock */ >>>>>> > while (_odp_atomic_flag_tas(IDX2LOCK(idx))) >>>>>> > /* While lock is taken, spin using relaxed >>>>>> loads */ >>>>>> > @@ -496,7 +436,6 @@ static bool timer_reset(uint32_t idx, >>>>>> > >>>>>> > /* Release the lock */ >>>>>> > _odp_atomic_flag_clear(IDX2LOCK(idx)); >>>>>> > -#endif >>>>>> > /* Return old timeout buffer */ >>>>>> > *tmo_buf = old_buf; >>>>>> > } >>>>>> > @@ -510,18 +449,6 @@ static odp_buffer_t >>>>>> timer_cancel(odp_timer_pool *tp, >>>>>> > tick_buf_t *tb = &tp->tick_buf[idx]; >>>>>> > odp_buffer_t old_buf; >>>>>> > >>>>>> > -#ifdef ODP_ATOMIC_U128 >>>>>> > - tick_buf_t new, old; >>>>>> > - /* Update the timer state (e.g. cancel the current timeout) >>>>>> */ >>>>>> > - new.exp_tck.v = new_state; >>>>>> > - /* Swap out the old buffer */ >>>>>> > - new.tmo_buf = ODP_BUFFER_INVALID; >>>>>> > - TB_SET_PAD(new); >>>>>> > - _odp_atomic_u128_xchg_mm((_odp_atomic_u128_t *)tb, >>>>>> > - (_uint128_t *)&new, (_uint128_t >>>>>> *)&old, >>>>>> > - _ODP_MEMMODEL_RLX); >>>>>> > - old_buf = old.tmo_buf; >>>>>> > -#else >>>>>> > /* Take a related lock */ >>>>>> > while (_odp_atomic_flag_tas(IDX2LOCK(idx))) >>>>>> > /* While lock is taken, spin using relaxed loads */ >>>>>> > @@ -537,7 +464,6 @@ static odp_buffer_t timer_cancel(odp_timer_pool >>>>>> *tp, >>>>>> > >>>>>> > /* Release the lock */ >>>>>> > _odp_atomic_flag_clear(IDX2LOCK(idx)); >>>>>> > -#endif >>>>>> > /* Return the old buffer */ >>>>>> > return old_buf; >>>>>> > } >>>>>> > @@ -548,34 +474,6 @@ static unsigned timer_expire(odp_timer_pool >>>>>> *tp, >>>>>> > uint32_t idx, uint64_t tick) >>>>>> > tick_buf_t *tb = &tp->tick_buf[idx]; >>>>>> > odp_buffer_t tmo_buf = ODP_BUFFER_INVALID; >>>>>> > uint64_t exp_tck; >>>>>> > -#ifdef ODP_ATOMIC_U128 >>>>>> > - /* Atomic re-read for correctness */ >>>>>> > - exp_tck = _odp_atomic_u64_load_mm(&tb->exp_tck, >>>>>> _ODP_MEMMODEL_RLX); >>>>>> > - /* Re-check exp_tck */ >>>>>> > - if (odp_likely(exp_tck <= tick)) { >>>>>> > - /* Attempt to grab timeout buffer, replace with >>>>>> inactive >>>>>> > timer >>>>>> > - * and invalid buffer */ >>>>>> > - tick_buf_t new, old; >>>>>> > - old.exp_tck.v = exp_tck; >>>>>> > - old.tmo_buf = tb->tmo_buf; >>>>>> > - TB_SET_PAD(old); >>>>>> > - /* Set the inactive/expired bit keeping the >>>>>> expiration >>>>>> > tick so >>>>>> > - * that we can check against the expiration tick of >>>>>> the >>>>>> > timeout >>>>>> > - * when it is received */ >>>>>> > - new.exp_tck.v = exp_tck | TMO_INACTIVE; >>>>>> > - new.tmo_buf = ODP_BUFFER_INVALID; >>>>>> > - TB_SET_PAD(new); >>>>>> > - int succ = _odp_atomic_u128_cmp_xchg_mm( >>>>>> > - (_odp_atomic_u128_t *)tb, >>>>>> > - (_uint128_t *)&old, (_uint128_t >>>>>> *)&new, >>>>>> > - _ODP_MEMMODEL_RLS, >>>>>> _ODP_MEMMODEL_RLX); >>>>>> > - if (succ) >>>>>> > - tmo_buf = old.tmo_buf; >>>>>> > - /* Else CAS failed, something changed => skip timer >>>>>> > - * this tick, it will be checked again next tick */ >>>>>> > - } >>>>>> > - /* Else false positive, ignore */ >>>>>> > -#else >>>>>> > /* Take a related lock */ >>>>>> > while (_odp_atomic_flag_tas(IDX2LOCK(idx))) >>>>>> > /* While lock is taken, spin using relaxed loads */ >>>>>> > @@ -600,7 +498,6 @@ static unsigned timer_expire(odp_timer_pool *tp, >>>>>> > uint32_t idx, uint64_t tick) >>>>>> > /* Else false positive, ignore */ >>>>>> > /* Release the lock */ >>>>>> > _odp_atomic_flag_clear(IDX2LOCK(idx)); >>>>>> > -#endif >>>>>> > if (odp_likely(tmo_buf != ODP_BUFFER_INVALID)) { >>>>>> > /* Fill in expiration tick for timeout events */ >>>>>> > if (odp_event_type(odp_buffer_to_event(tmo_buf)) == >>>>>> > @@ -974,13 +871,9 @@ void odp_timeout_free(odp_timeout_t tmo) >>>>>> > >>>>>> > int odp_timer_init_global(void) >>>>>> > { >>>>>> > -#ifndef ODP_ATOMIC_U128 >>>>>> > uint32_t i; >>>>>> > for (i = 0; i < NUM_LOCKS; i++) >>>>>> > _odp_atomic_flag_clear(&locks[i]); >>>>>> > -#else >>>>>> > - ODP_DBG("Using lock-less timer implementation\n"); >>>>>> > -#endif >>>>>> > odp_atomic_init_u32(&num_timer_pools, 0); >>>>>> > >>>>>> > block_sigalarm(); >>>>>> > -- >>>>>> > 2.7.1.250.gff4ea60 >>>>>> > >>>>>> > _______________________________________________ >>>>>> > lng-odp mailing list >>>>>> > [email protected] >>>>>> > https://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> > >>>>>> _______________________________________________ >>>>>> lng-odp mailing list >>>>>> [email protected] >>>>>> https://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> >>>>> >>>>> >>>> >>> >> > _______________________________________________ lng-odp mailing list [email protected] https://lists.linaro.org/mailman/listinfo/lng-odp
