also: HOLY CRAP THATS AWESOME performance :) (i'll be wanting to do some cache aware parallel work stealing in the near future, so this is really really handy for me)
On Sat, Jul 20, 2013 at 2:18 AM, Ryan Newton <rrnew...@gmail.com> wrote: > Hi Carter, > > Yes, SMP.h is where I've copy pasted the duplicate functionality from > (since I can't presently rely on linking the symbols). > > Your proposal for the LLVM backend sounds **great**. But it also is > going to provide additional constraints for getting "atomic-primops" right. > > The goal of atomic-primops is to be a stable Haskell-level interface > into the relevant CAS and fetch-and-add stuff. The reason this is > important is that one has to be very careful to defeat the GHC optimizer in > all the relevant places and make pointer equality a reliable property. I > would like to get atomic-primops to work reliably in 7.4, 7.6 [and 7.8] and > have more "native" support in future GHC releases, where maybe the foreign > primops would become unecessary. (They are a pain and have already exposed > one blocking cabal bug, fixed in upcoming 1.17.) > > A couple additional suggestions for the proposal in ticket #7883: > > - we should use more unique symbols than "cas", especially for this > rewriting trick. How about "ghc_cas" or something? > - it would be great to get at least fetch-and-add in addition to CAS > and barriers > - if we reliably provide this set of special symbols, libraries like > atomic-primops may use them in the .cmm and benefit from the CMM->LLVM > substitutions > - if we include all the primops I need in GHC proper the previous > bullet will stop applying ;-) > > Cheers, > -Ryan > > P.S. Just as a bit of motivation, here are some recent performance > numbers. We often wonder about how close our "pure values in a box" > approach comes to efficient lock-free structures. Well here are some > numbers about using a proper unboxed counter in the Haskell heap, vs using > an IORef Int and atomicModifyIORef': Up to 100X performance difference > on some platforms for microbenchmarks that hammer a counter: > > > https://github.com/rrnewton/haskell-lockfree-queue/blob/fb12d1121690553e4f737af258848f279147ea24/AtomicPrimops/DEVLOG.md#20130718-timing-atomic-counter-ops > > And here are the performance and scaling advantages of using ChaseLev > (based on atomic-primops), over a traditional pure-in-a-box structure > (IORef Data.Seq). The following are timings of ChaseLev/traditional > respectively on a 32 core westmere: > > fib(42) 1 threads: 21s > fib(42) 2 threads: 10.1s > fib(42) 4 threads: 5.2s (100%prod) > fib(42) 8 threads: 2.7s - 3.2s (100%prod) > fib(42) 16 threads: 1.28s > fib(42) 24 threads: 1.85s > fib(42) 32 threads: 4.8s (high variance) > > (hive) fib(42) 1 threads: 41.8s (95% prod) > (hive) fib(42) 2 threads: 25.2s (66% prod) > (hive) fib(42) 4 threads: 14.6s (27% prod, 135GB alloc) > (hive) fib(42) 8 threads: 17.1s (26% prod) > (hive) fib(42) 16 threads: 16.3s (13% prod) > (hive) fib(42) 24 threads: 21.2s (30% prod) > (hive) fib(42) 32 threads: 29.3s (33% prod) > > And that is WITH the inefficiency of doing a "ccall" on every single > atomic operation. > > Notes on parfib performance are here: > > > https://github.com/rrnewton/haskell-lockfree-queue/blob/d6d3e9eda2a487a5f055b1f51423954bb6b6bdfa/ChaseLev/Test.hs#L158 > > > > > > > > On Fri, Jul 19, 2013 at 5:05 PM, Carter Schonwald < > carter.schonw...@gmail.com> wrote: > >> ryan, the relevant machinery on the C side is here, see >> ./includes/stg/SMP.h : >> https://github.com/ghc/ghc/blob/7cc8a3cc5c2970009b83844ff9cc4e27913b8559/includes/stg/SMP.h >> >> (unless i'm missing something) >> >> >> On Fri, Jul 19, 2013 at 4:53 PM, Carter Schonwald < >> carter.schonw...@gmail.com> wrote: >> >>> Ryan, >>> if you look at line 270, you'll see the CAS is a C call >>> https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L270 >>> >>> >>> What Simon is alluding to is some work I started (but need to finish) >>> http://ghc.haskell.org/trac/ghc/ticket/7883 is the relevant ticket, and >>> I'll need to sort out doing the same on the native code gen too >>> >>> there ARE no write barrier primops, they're baked into the CAS machinery >>> in ghc's rts >>> >>> >>> On Fri, Jul 19, 2013 at 1:02 PM, Ryan Newton <rrnew...@gmail.com> wrote: >>> >>>> Yes, I'd absolutely rather not suffer C call overhead for these >>>> functions (or the CAS functions). But isn't that how it's done currently >>>> for the casMutVar# primop? >>>> >>>> >>>> https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L265 >>>> >>>> To avoid the overhead, is it necessary to make each primop in-line >>>> rather than out-of-line, or just to get rid of the "ccall"? >>>> >>>> Another reason it would be good to package these with GHC is that I'm >>>> having trouble building robust libraries of foreign primops that work under >>>> all "ways" (e.g. GHCI). For example, this bug: >>>> >>>> https://github.com/rrnewton/haskell-lockfree-queue/issues/10 >>>> >>>> If I write .cmm code that depends on RTS functionality like >>>> stg_MUT_VAR_CLEAN_info, then it seems to work fine when in compiled mode >>>> (with/without threading, profiling), but I get link errors from GHCI where >>>> these symbols aren't defined. >>>> >>>> I've got a draft of the relevant primops here: >>>> >>>> >>>> https://github.com/rrnewton/haskell-lockfree-queue/blob/master/AtomicPrimops/cbits/primops.cmm >>>> >>>> Which includes: >>>> >>>> - variants of CAS for MutableArray# and MutableByteArray# >>>> - fetch-and-add for MutableByteArray# >>>> >>>> Also, there are some tweaks to support the new "ticketed" interface for >>>> safer CAS: >>>> >>>> >>>> http://hackage.haskell.org/packages/archive/atomic-primops/0.3/doc/html/Data-Atomics.html#g:3 >>>> >>>> I started adding some of these primops to GHC proper (still as >>>> out-of-line), but not all of them. I had gone with the foreign primop >>>> route instead... >>>> >>>> https://github.com/rrnewton/ghc/commits/master >>>> >>>> -Ryan >>>> >>>> P.S. Where is the write barrier primop? I don't see it listed in >>>> prelude/primops.txt... >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Jul 19, 2013 at 11:41 AM, Carter Schonwald < >>>> carter.schonw...@gmail.com> wrote: >>>> >>>>> I guess I should find the time to finish the CAS primop work I >>>>> volunteered to do then. Ill look into in a few days. >>>>> >>>>> >>>>> On Friday, July 19, 2013, Simon Marlow wrote: >>>>> >>>>>> On 18/07/13 14:17, Ryan Newton wrote: >>>>>> >>>>>>> The "atomic-primops" library depends on symbols such as >>>>>>> store_load_barrier and "cas", which are defined in SMP.h. Thus the >>>>>>> result is that if the program is linked WITHOUT "-threaded", the user >>>>>>> gets a linker error about undefined symbols. >>>>>>> >>>>>>> The specific place it's used is in the 'foreign "C"' bits of this >>>>>>> .cmm code: >>>>>>> >>>>>>> https://github.com/rrnewton/**haskell-lockfree-queue/blob/** >>>>>>> 87e63b21b2a6c375e93c30b98c28c1**d04f88781c/AtomicPrimops/** >>>>>>> cbits/primops.cmm<https://github.com/rrnewton/haskell-lockfree-queue/blob/87e63b21b2a6c375e93c30b98c28c1d04f88781c/AtomicPrimops/cbits/primops.cmm> >>>>>>> >>>>>>> I'm trying to explore hacks that will enable me to pull in those >>>>>>> functions during compile time, without duplicating a whole bunch of >>>>>>> code >>>>>>> from the RTS. But it's a fragile business. >>>>>>> >>>>>>> It seems to me that some of these routines have general utility. In >>>>>>> future versions of GHC, could we consider linking in those routines >>>>>>> irrespective of "-threaded"? >>>>>>> >>>>>> >>>>>> We should make the non-THREADED versions EXTERN_INLINE too, so that >>>>>> there will be (empty) functions to call in rts/Inlines.c. Want to >>>>>> submit a >>>>>> patch? >>>>>> >>>>>> A better solution would be to make them into primops. You don't >>>>>> really want to be calling out to a C function to implement a memory >>>>>> barrier. We have this for write_barrier(), but none of the others so far. >>>>>> Of couse that's a larger change. >>>>>> >>>>>> Cheers, >>>>>> Simon >>>>>> >>>>>> >>>>>> >>>>>> ______________________________**_________________ >>>>>> ghc-devs mailing list >>>>>> ghc-devs@haskell.org >>>>>> http://www.haskell.org/**mailman/listinfo/ghc-devs<http://www.haskell.org/mailman/listinfo/ghc-devs> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs