Hi Carter, Yes, SMP.h is where I've copy pasted the duplicate functionality from (since I can't presently rely on linking the symbols).
Your proposal for the LLVM backend sounds **great**. But it also is going to provide additional constraints for getting "atomic-primops" right. The goal of atomic-primops is to be a stable Haskell-level interface into the relevant CAS and fetch-and-add stuff. The reason this is important is that one has to be very careful to defeat the GHC optimizer in all the relevant places and make pointer equality a reliable property. I would like to get atomic-primops to work reliably in 7.4, 7.6 [and 7.8] and have more "native" support in future GHC releases, where maybe the foreign primops would become unecessary. (They are a pain and have already exposed one blocking cabal bug, fixed in upcoming 1.17.) A couple additional suggestions for the proposal in ticket #7883: - we should use more unique symbols than "cas", especially for this rewriting trick. How about "ghc_cas" or something? - it would be great to get at least fetch-and-add in addition to CAS and barriers - if we reliably provide this set of special symbols, libraries like atomic-primops may use them in the .cmm and benefit from the CMM->LLVM substitutions - if we include all the primops I need in GHC proper the previous bullet will stop applying ;-) Cheers, -Ryan P.S. Just as a bit of motivation, here are some recent performance numbers. We often wonder about how close our "pure values in a box" approach comes to efficient lock-free structures. Well here are some numbers about using a proper unboxed counter in the Haskell heap, vs using an IORef Int and atomicModifyIORef': Up to 100X performance difference on some platforms for microbenchmarks that hammer a counter: https://github.com/rrnewton/haskell-lockfree-queue/blob/fb12d1121690553e4f737af258848f279147ea24/AtomicPrimops/DEVLOG.md#20130718-timing-atomic-counter-ops And here are the performance and scaling advantages of using ChaseLev (based on atomic-primops), over a traditional pure-in-a-box structure (IORef Data.Seq). The following are timings of ChaseLev/traditional respectively on a 32 core westmere: fib(42) 1 threads: 21s fib(42) 2 threads: 10.1s fib(42) 4 threads: 5.2s (100%prod) fib(42) 8 threads: 2.7s - 3.2s (100%prod) fib(42) 16 threads: 1.28s fib(42) 24 threads: 1.85s fib(42) 32 threads: 4.8s (high variance) (hive) fib(42) 1 threads: 41.8s (95% prod) (hive) fib(42) 2 threads: 25.2s (66% prod) (hive) fib(42) 4 threads: 14.6s (27% prod, 135GB alloc) (hive) fib(42) 8 threads: 17.1s (26% prod) (hive) fib(42) 16 threads: 16.3s (13% prod) (hive) fib(42) 24 threads: 21.2s (30% prod) (hive) fib(42) 32 threads: 29.3s (33% prod) And that is WITH the inefficiency of doing a "ccall" on every single atomic operation. Notes on parfib performance are here: https://github.com/rrnewton/haskell-lockfree-queue/blob/d6d3e9eda2a487a5f055b1f51423954bb6b6bdfa/ChaseLev/Test.hs#L158 On Fri, Jul 19, 2013 at 5:05 PM, Carter Schonwald < carter.schonw...@gmail.com> wrote: > ryan, the relevant machinery on the C side is here, see > ./includes/stg/SMP.h : > https://github.com/ghc/ghc/blob/7cc8a3cc5c2970009b83844ff9cc4e27913b8559/includes/stg/SMP.h > > (unless i'm missing something) > > > On Fri, Jul 19, 2013 at 4:53 PM, Carter Schonwald < > carter.schonw...@gmail.com> wrote: > >> Ryan, >> if you look at line 270, you'll see the CAS is a C call >> https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L270 >> >> >> What Simon is alluding to is some work I started (but need to finish) >> http://ghc.haskell.org/trac/ghc/ticket/7883 is the relevant ticket, and >> I'll need to sort out doing the same on the native code gen too >> >> there ARE no write barrier primops, they're baked into the CAS machinery >> in ghc's rts >> >> >> On Fri, Jul 19, 2013 at 1:02 PM, Ryan Newton <rrnew...@gmail.com> wrote: >> >>> Yes, I'd absolutely rather not suffer C call overhead for these >>> functions (or the CAS functions). But isn't that how it's done currently >>> for the casMutVar# primop? >>> >>> >>> https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L265 >>> >>> To avoid the overhead, is it necessary to make each primop in-line >>> rather than out-of-line, or just to get rid of the "ccall"? >>> >>> Another reason it would be good to package these with GHC is that I'm >>> having trouble building robust libraries of foreign primops that work under >>> all "ways" (e.g. GHCI). For example, this bug: >>> >>> https://github.com/rrnewton/haskell-lockfree-queue/issues/10 >>> >>> If I write .cmm code that depends on RTS functionality like >>> stg_MUT_VAR_CLEAN_info, then it seems to work fine when in compiled mode >>> (with/without threading, profiling), but I get link errors from GHCI where >>> these symbols aren't defined. >>> >>> I've got a draft of the relevant primops here: >>> >>> >>> https://github.com/rrnewton/haskell-lockfree-queue/blob/master/AtomicPrimops/cbits/primops.cmm >>> >>> Which includes: >>> >>> - variants of CAS for MutableArray# and MutableByteArray# >>> - fetch-and-add for MutableByteArray# >>> >>> Also, there are some tweaks to support the new "ticketed" interface for >>> safer CAS: >>> >>> >>> http://hackage.haskell.org/packages/archive/atomic-primops/0.3/doc/html/Data-Atomics.html#g:3 >>> >>> I started adding some of these primops to GHC proper (still as >>> out-of-line), but not all of them. I had gone with the foreign primop >>> route instead... >>> >>> https://github.com/rrnewton/ghc/commits/master >>> >>> -Ryan >>> >>> P.S. Where is the write barrier primop? I don't see it listed in >>> prelude/primops.txt... >>> >>> >>> >>> >>> >>> On Fri, Jul 19, 2013 at 11:41 AM, Carter Schonwald < >>> carter.schonw...@gmail.com> wrote: >>> >>>> I guess I should find the time to finish the CAS primop work I >>>> volunteered to do then. Ill look into in a few days. >>>> >>>> >>>> On Friday, July 19, 2013, Simon Marlow wrote: >>>> >>>>> On 18/07/13 14:17, Ryan Newton wrote: >>>>> >>>>>> The "atomic-primops" library depends on symbols such as >>>>>> store_load_barrier and "cas", which are defined in SMP.h. Thus the >>>>>> result is that if the program is linked WITHOUT "-threaded", the user >>>>>> gets a linker error about undefined symbols. >>>>>> >>>>>> The specific place it's used is in the 'foreign "C"' bits of this >>>>>> .cmm code: >>>>>> >>>>>> https://github.com/rrnewton/**haskell-lockfree-queue/blob/** >>>>>> 87e63b21b2a6c375e93c30b98c28c1**d04f88781c/AtomicPrimops/** >>>>>> cbits/primops.cmm<https://github.com/rrnewton/haskell-lockfree-queue/blob/87e63b21b2a6c375e93c30b98c28c1d04f88781c/AtomicPrimops/cbits/primops.cmm> >>>>>> >>>>>> I'm trying to explore hacks that will enable me to pull in those >>>>>> functions during compile time, without duplicating a whole bunch of >>>>>> code >>>>>> from the RTS. But it's a fragile business. >>>>>> >>>>>> It seems to me that some of these routines have general utility. In >>>>>> future versions of GHC, could we consider linking in those routines >>>>>> irrespective of "-threaded"? >>>>>> >>>>> >>>>> We should make the non-THREADED versions EXTERN_INLINE too, so that >>>>> there will be (empty) functions to call in rts/Inlines.c. Want to submit >>>>> a >>>>> patch? >>>>> >>>>> A better solution would be to make them into primops. You don't >>>>> really want to be calling out to a C function to implement a memory >>>>> barrier. We have this for write_barrier(), but none of the others so far. >>>>> Of couse that's a larger change. >>>>> >>>>> Cheers, >>>>> Simon >>>>> >>>>> >>>>> >>>>> ______________________________**_________________ >>>>> ghc-devs mailing list >>>>> ghc-devs@haskell.org >>>>> http://www.haskell.org/**mailman/listinfo/ghc-devs<http://www.haskell.org/mailman/listinfo/ghc-devs> >>>>> >>>> >>> >> >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs