I got the test suite running on my (2 core) machine mac book air, with 7.8 i've run it several times, not seeing any failures
On Sat, Feb 1, 2014 at 1:03 PM, Carter Schonwald <[email protected] > wrote: > hrmmmm I have a crazy idea > > "Compare RAX with r/m64. If equal, ZF is set and r64 is loaded into r/m64. > Else, clear ZF > and load r/m64 into RAX." is what the docs say for the cmpxchng > instruction > > so RAX is the old values, (EAX in the 32bit case). And it looks like we > dont' set that explicitly when calling the asm .. CMPXCHG r/m64, r64 > > hrmmm > > > > On Sat, Feb 1, 2014 at 12:52 PM, Ryan Newton <[email protected]> wrote: > >> Ok, here's another experiment, on this commit: >> >> >> https://github.com/rrnewton/haskell-lockfree/commit/399bb19fa02eaf2f2eab5d02c4b608535362f9bc >> >> Here, if I use GCC's __sync_val_compare_and_swap instead of GHC's version >> of cas(), the problem also goes away. I think these two implementations >> should behave identically, and that they don't perhaps indicates that there >> is something off about the inline asm, as Carter was suggesting: >> >> *#if i386_HOST_ARCH || x86_64_HOST_ARCH* >> * __asm__ __volatile__ (* >> * "lock\ncmpxchg %3,%1"* >> * :"=a"(o), "=m" (*(volatile unsigned int *)p) * >> * :"0" (o), "r" (n));* >> * return o;* >> >> The x86 CAS instruction must put the "return value" in the accumulator >> register, and indeed this constrains "o" to be allocated to the accumulator >> register, while the new value "n" can be in any register. >> >> So if there's a problem, I don't know what it is. Except I'm not sure >> what the ramifications of "o" being a function parameter AND having an "=a" >> constraint on it are... >> >> -Ryan >> >> >> >> On Sat, Feb 1, 2014 at 11:27 AM, Carter Schonwald < >> [email protected]> wrote: >> >>> Hey Ryan, i've made the leap to using 7.8 on my machine, so i'll first >>> have to get some pull requests in on atomic-primops before I can test it >>> locally :), expect those patches later today! >>> >>> looks like gcc's inline ASM logic is pretty correct, after testing it a >>> bit locally, pardon my speculative jumping the gun. >>> >>> >>> On Sat, Feb 1, 2014 at 9:10 AM, Ryan Newton <[email protected]> wrote: >>> >>>> Hi Carter & others, >>>> >>>> Carter, yes, this is CAS on pointers and in my next mail I'll try to >>>> come up with some hypotheses as to why we may have (remaining) problems >>>> there. >>>> >>>> But first, I have been assured that on x86 there is no failure mode in >>>> which doing a comparison on the value read by CAS should not correctly >>>> diagnose success or failure (same as directly reading the Zero Flag) [1]. >>>> >>>> And yet, there's this discrepancy, where the modified casMutVar that I >>>> linked to does not have the failure. As for reproducing the failure, >>>> either of the two following tests will currently show problems: >>>> >>>> - Two threads try to casIORef False->True, both succeed >>>> - 120 threads try to read, increment, CAS until they succeed. The >>>> total is often not 120 because multiple threads think the successfully >>>> incremented, say, 33->34. >>>> >>>> Here's a specific recipe for the latter test on GHC 7.6.3 Mac or Linux: >>>> >>>> >>>> *git clone [email protected]:rrnewton/haskell-lockfree-queue.git * >>>> *cd haskell-lockfree-queue/AtomicPrimops/* >>>> >>>> *git checkout 1a1e7e55f6706f9e5754* >>>> >>>> *cabal sandbox init* >>>> >>>> *cabal install -f-withTH -fforeign ./ ./testing --enable-tests* >>>> >>>> *./testing/dist/dist-sandbox-*/build/test-atomic-primops/test-atomic-primops >>>> -t n_threads* >>>> >>>> You may have to run the last line several times to see the failure. >>>> >>>> Best, >>>> -Ryan >>>> >>>> [1] I guess the __sync_bool_compare_and_swap intrinsic which reads ZF >>>> is there just to avoid the extra comparison. >>>> >>>> [2] P.S. I'd like to try this on GHC head, but the RHEL 6 machine I >>>> usually use to build it is currently not validating (below error, commit >>>> 65d05d7334). After I debug this gmp problem I'll confirm that the bug >>>> under discussion applies on the 7.8 branch. >>>> >>>> ./sync-all checkout ghc-7.8 >>>> sh validate >>>> ... >>>> /usr/bin/ld: libraries/integer-gmp/gmp/objs/aors.o: relocation >>>> R_X86_64_32 against `__gmpz_sub' can not be used when making a shared >>>> object; recompile with -fPIC >>>> libraries/integer-gmp/gmp/objs/aors.o: could not read symbols: Bad value >>>> >>>> >>>> >>>> >>>> On Sat, Feb 1, 2014 at 2:55 AM, Carter Schonwald < >>>> [email protected]> wrote: >>>> >>>>> Ryan, is your benchmark using CAS on pointers, or immediate words? >>>>> trying to get atomic primops to build on my 7.8 build on my mac >>>>> >>>>> >>>>> On Sat, Feb 1, 2014 at 2:44 AM, Carter Schonwald < >>>>> [email protected]> wrote: >>>>> >>>>>> https://ghc.haskell.org/trac/ghc/ticket/8724#ticket is the ticket >>>>>> >>>>>> when i'm more awake i'll experiment some more >>>>>> >>>>>> >>>>>> On Sat, Feb 1, 2014 at 2:33 AM, Carter Schonwald < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> i have a ticket for tracking this, though i'm thinking my initial >>>>>>> attempt at a patch generates the same object code as it did before. >>>>>>> >>>>>>> @ryan, what CPU variant are you testing this on? is this on a NUMA >>>>>>> machine or something? >>>>>>> >>>>>>> >>>>>>> On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> woops, i mean cmpxchgq >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> ok, i can confirm that on my 64bit mac, both clang and gcc >>>>>>>>> use cmpxchgl rather than cmpxchg >>>>>>>>> i'll whip up a strawman patch on head that can be cherrypicked / >>>>>>>>> tested out by ryan et al >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hey Ryan, >>>>>>>>>> looking at this closely >>>>>>>>>> Why isn't CAS using CMPXCHG8B on 64bit architectures? Could that >>>>>>>>>> be the culprit? >>>>>>>>>> >>>>>>>>>> Could the issue be that we've not had a good stress test that >>>>>>>>>> would create values that are equal on the 32bit range, but differ on >>>>>>>>>> the >>>>>>>>>> 64bit range, and you're hitting that? >>>>>>>>>> >>>>>>>>>> Could you try seeing if doing that change fixes things up? >>>>>>>>>> (I may be completely wrong, but just throwing this out as a naive >>>>>>>>>> "obvious" guess) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton >>>>>>>>>> <[email protected]>wrote: >>>>>>>>>> >>>>>>>>>>> Then again... I'm having trouble seeing how the spec on page >>>>>>>>>>> 3-149 of the Intel manual would allow the behavior I'm seeing: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf >>>>>>>>>>> >>>>>>>>>>> Nevertheless, this is exactly the behavior we're seeing with the >>>>>>>>>>> current Haskell primops. Two threads simultaneously performing the >>>>>>>>>>> same >>>>>>>>>>> CAS(p,a,b) can both think that they succeeded. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> I commented on the commit here: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b >>>>>>>>>>>> >>>>>>>>>>>> The problem is that our "cas" routine in SMP.h is similar to >>>>>>>>>>>> the C compiler intrinsic __sync_val_compare_and_swap, in that it >>>>>>>>>>>> returns >>>>>>>>>>>> the old value. But it seems we cannot use a comparison against >>>>>>>>>>>> that old >>>>>>>>>>>> value to determine whether or not the CAS succeeded. (I believe >>>>>>>>>>>> the CAS >>>>>>>>>>>> may fail due to contention, but the old value may happen to look >>>>>>>>>>>> like our >>>>>>>>>>>> old value.) >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately, this didn't occur to me until it started causing >>>>>>>>>>>> bugs [1] [2]. Fixing casMutVar# fixes these bugs. However, the >>>>>>>>>>>> way I'm >>>>>>>>>>>> currently fixing CAS in the "atomic-primops" package is by using >>>>>>>>>>>> __sync_bool_compare_and_swap: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200 >>>>>>>>>>>> >>>>>>>>>>>> What is the best fix for GHC itself? Would it be ok for GHC >>>>>>>>>>>> to include a C compiler intrinsic like __sync_val_compare_and_swap? >>>>>>>>>>>> Otherwise we need another big ifdbef'd function like "cas" in >>>>>>>>>>>> SMP.h that >>>>>>>>>>>> has the architecture-specific inline asm across all architectures. >>>>>>>>>>>> I can >>>>>>>>>>>> write the x86 one, but I'm not eager to try the others. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> -Ryan >>>>>>>>>>>> >>>>>>>>>>>> [1] https://github.com/iu-parfunc/lvars/issues/70 >>>>>>>>>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> ghc-devs mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ ghc-devs mailing list [email protected] http://www.haskell.org/mailman/listinfo/ghc-devs
