Hey Ryan, i've made the leap to using 7.8 on my machine, so i'll first have to get some pull requests in on atomic-primops before I can test it locally :), expect those patches later today!
looks like gcc's inline ASM logic is pretty correct, after testing it a bit locally, pardon my speculative jumping the gun. On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton <[email protected]> wrote: > Hi Carter & others, > > Carter, yes, this is CAS on pointers and in my next mail I'll try to come > up with some hypotheses as to why we may have (remaining) problems there. > > But first, I have been assured that on x86 there is no failure mode in > which doing a comparison on the value read by CAS should not correctly > diagnose success or failure (same as directly reading the Zero Flag) [1]. > > And yet, there's this discrepancy, where the modified casMutVar that I > linked to does not have the failure. As for reproducing the failure, > either of the two following tests will currently show problems: > > - Two threads try to casIORef False->True, both succeed > - 120 threads try to read, increment, CAS until they succeed. The > total is often not 120 because multiple threads think the successfully > incremented, say, 33->34. > > Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux: > > > *git clone [email protected]:rrnewton/haskell-lockfree-queue.git * > *cd haskell-lockfree-queue/AtomicPrimops/* > > *git checkout 1a1e7e55f6706f9e5754* > > *cabal sandbox init* > > *cabal install -f-withTH -fforeign ./ ./testing --enable-tests* > > *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops > -t n_threads* > > You may have to run the last line several times to see the failure. > > Best, > -Ryan > > [1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF is > there just to avoid the extra comparison. > > [2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I > usually use to build it is currently not validating (below error, commit > 65d05d7334). After I debug this gmp problem I'll confirm that the bug > under discussion applies on the 7.8 branch. > > ./sync-all checkout ghc-7.8 > sh validate > ... > /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation R_X86_64_32 > against `__gmpz_sub' can not be used when making a shared object; recompile > with -fPIC > libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value > > > > > On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald < > [email protected]> wrote: > >> Ryan, is your benchmark using CAS on pointers, or immediate words? trying >> to get atomic primops to build on my 7.8 build on my mac >> >> >> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald < >> [email protected]> wrote: >> >>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket >>> >>> when i'm more awake i'll experiment some more >>> >>> >>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald < >>> [email protected]> wrote: >>> >>>> i have a ticket for tracking this, though i'm thinking my initial >>>> attempt at a patch generates the same object code as it did before. >>>> >>>> @ryan, what CPU variant are you testing this on? is this on a NUMA >>>> machine or something? >>>> >>>> >>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald < >>>> [email protected]> wrote: >>>> >>>>> woops, i mean cmpxchgq >>>>> >>>>> >>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald < >>>>> [email protected]> wrote: >>>>> >>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc >>>>>> use cmpxchgl rather than cmpxchg >>>>>> i'll whip up a strawman patch on head that can be cherrypicked / >>>>>> tested out by ryan et al >>>>>> >>>>>> >>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey Ryan, >>>>>>> looking at this closely >>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures? Could that be >>>>>>> the culprit? >>>>>>> >>>>>>> Could the issue be that we've not had a good stress test that would >>>>>>> create values that are equal on the 32bit range, but differ on the 64bit >>>>>>> range, and you're hitting that? >>>>>>> >>>>>>> Could you try seeing if doing that change fixes things up? >>>>>>> (I may be completely wrong, but just throwing this out as a naive >>>>>>> "obvious" guess) >>>>>>> >>>>>>> >>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <[email protected]>wrote: >>>>>>> >>>>>>>> Then again... I'm having trouble seeing how the spec on page 3-149 >>>>>>>> of the Intel manual would allow the behavior I'm seeing: >>>>>>>> >>>>>>>> >>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf >>>>>>>> >>>>>>>> Nevertheless, this is exactly the behavior we're seeing with the >>>>>>>> current Haskell primops. Two threads simultaneously performing the >>>>>>>> same >>>>>>>> CAS(p,a,b) can both think that they succeeded. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <[email protected]>wrote: >>>>>>>> >>>>>>>>> I commented on the commit here: >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b >>>>>>>>> >>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to the C >>>>>>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns >>>>>>>>> the old >>>>>>>>> value. But it seems we cannot use a comparison against that old >>>>>>>>> value to >>>>>>>>> determine whether or not the CAS succeeded. (I believe the CAS may >>>>>>>>> fail >>>>>>>>> due to contention, but the old value may happen to look like our old >>>>>>>>> value.) >>>>>>>>> >>>>>>>>> Unfortunately, this didn't occur to me until it started causing >>>>>>>>> bugs [1] [2]. Fixing casMutVar# fixes these bugs. However, the way >>>>>>>>> I'm >>>>>>>>> currently fixing CAS in the "atomic-primops" package is by using >>>>>>>>> __sync_bool_compare_and_swap: >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200 >>>>>>>>> >>>>>>>>> What is the best fix for GHC itself? Would it be ok for GHC to >>>>>>>>> include a C compiler intrinsic like __sync_val_compare_and_swap? >>>>>>>>> Otherwise >>>>>>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the >>>>>>>>> architecture-specific inline asm across all architectures. I can >>>>>>>>> write the >>>>>>>>> x86 one, but I'm not eager to try the others. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> -Ryan >>>>>>>>> >>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70 >>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15 >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ghc-devs mailing list >>>>>>>> [email protected] >>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ ghc-devs mailing list [email protected] http://www.haskell.org/mailman/listinfo/ghc-devs
