i have a ticket for tracking this, though i'm thinking my initial attempt at a patch generates the same object code as it did before.
@ryan, what CPU variant are you testing this on? is this on a NUMA machine or something? On Sat, Feb 1, 2014 at 1:58 AM, Carter Schonwald <[email protected] > wrote: > woops, i mean cmpxchgq > > > On Sat, Feb 1, 2014 at 1:36 AM, Carter Schonwald < > [email protected]> wrote: > >> ok, i can confirm that on my 64bit mac, both clang and gcc use cmpxchgl >> rather than cmpxchg >> i'll whip up a strawman patch on head that can be cherrypicked / tested >> out by ryan et al >> >> >> On Sat, Feb 1, 2014 at 1:12 AM, Carter Schonwald < >> [email protected]> wrote: >> >>> Hey Ryan, >>> looking at this closely >>> Why isn't CAS using CMPXCHG8B on 64bit architectures? Could that be the >>> culprit? >>> >>> Could the issue be that we've not had a good stress test that would >>> create values that are equal on the 32bit range, but differ on the 64bit >>> range, and you're hitting that? >>> >>> Could you try seeing if doing that change fixes things up? >>> (I may be completely wrong, but just throwing this out as a naive >>> "obvious" guess) >>> >>> >>> On Sat, Feb 1, 2014 at 12:58 AM, Ryan Newton <[email protected]> wrote: >>> >>>> Then again... I'm having trouble seeing how the spec on page 3-149 of >>>> the Intel manual would allow the behavior I'm seeing: >>>> >>>> >>>> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf >>>> >>>> Nevertheless, this is exactly the behavior we're seeing with the >>>> current Haskell primops. Two threads simultaneously performing the same >>>> CAS(p,a,b) can both think that they succeeded. >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Feb 1, 2014 at 12:33 AM, Ryan Newton <[email protected]>wrote: >>>> >>>>> I commented on the commit here: >>>>> >>>>> >>>>> https://github.com/ghc/ghc/commit/521b792553bacbdb0eec138b150ab0626ea6f36b >>>>> >>>>> The problem is that our "cas" routine in SMP.h is similar to the C >>>>> compiler intrinsic __sync_val_compare_and_swap, in that it returns the old >>>>> value. But it seems we cannot use a comparison against that old value to >>>>> determine whether or not the CAS succeeded. (I believe the CAS may fail >>>>> due to contention, but the old value may happen to look like our old >>>>> value.) >>>>> >>>>> Unfortunately, this didn't occur to me until it started causing bugs >>>>> [1] [2]. Fixing casMutVar# fixes these bugs. However, the way I'm >>>>> currently fixing CAS in the "atomic-primops" package is by using >>>>> __sync_bool_compare_and_swap: >>>>> >>>>> >>>>> https://github.com/rrnewton/haskell-lockfree/commit/f9716ddd94d5eff7420256de22cbf38c02322d7a#diff-be3304b3ecdd8e1f9ed316cd844d711aR200 >>>>> >>>>> What is the best fix for GHC itself? Would it be ok for GHC to >>>>> include a C compiler intrinsic like __sync_val_compare_and_swap? >>>>> Otherwise >>>>> we need another big ifdbef'd function like "cas" in SMP.h that has the >>>>> architecture-specific inline asm across all architectures. I can write >>>>> the >>>>> x86 one, but I'm not eager to try the others. >>>>> >>>>> Best, >>>>> -Ryan >>>>> >>>>> [1] https://github.com/iu-parfunc/lvars/issues/70 >>>>> [2] https://github.com/rrnewton/haskell-lockfree/issues/15 >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> ghc-devs mailing list >>>> [email protected] >>>> http://www.haskell.org/mailman/listinfo/ghc-devs >>>> >>>> >>> >> >
_______________________________________________ ghc-devs mailing list [email protected] http://www.haskell.org/mailman/listinfo/ghc-devs
