One point I'm getting from this discussion is that perhaps not much time has been spent considering these issues in ghc backends. If so, it's probably a good thing to work through it now.
For myself, I guess the only option I have now is to measure using loadLoadBarrier and see if it's better or worse than calling atomicModifyIORef. On Dec 31, 2013 6:42 AM, "Edward Z. Yang" <ezy...@mit.edu> wrote: > I was thinking about my response, and realized there was one major > misleading thing in my description. The load reordering I described > applies to load instructions in C-- proper, i.e. things that show up > in the C-- dup as: > > W_ x = I64[...addr...] > > Reads to IORefs and reads to vectors get compiled inline (as they > eventually translate into inline primops), so my admonitions are > applicable. > > However, the story with *foreign primops* (which is how loadLoadBarrier > in atomic-primops is defined, how you might imagine defining a custom > read function as a primop) is a little different. First, what does a > call to an foreign primop compile into? It is *not* inlined, so it will > eventually get compiled into a jump (this could be a problem if you're > really trying to squeeze out performance!) Second, the optimizer is a > bit more conservative when it comes to primop calls (internally referred > to as "unsafe foreign calls"); at the moment, the optimizer assumes > these foreign calls clobber heap memory, so we *automatically* will not > push loads/stores beyond this boundary. (NB: We reserve the right to > change this in the future!) > > This is probably why atomic-primops, as it is written today, seems to > work OK, even in the presence of the optimizer. But I also have a hard > time believing it gives the speedups you want, due to the current > design. (CC'd Ryan Newton, because I would love to be wrong here, and > maybe he can correct me on this note.) > > Cheers, > Edward > > P.S. loadLoadBarrier compiles to a no-op on x86 architectures, but > because it's not inlined I think you will still end up with a jump (LLVM > might be able to eliminate it). > > Excerpts from John Lato's message of 2013-12-31 03:01:58 +0800: > > Hi Edward, > > > > Thanks very much for this reply, it answers a lot of questions I'd had. > > I'd hoped that ordering would be preserved through C--, but c'est la > vie. > > Optimizing compilers are ever the bane of concurrent algorithms! > > > > stg/SMP.h does define a loadLoadBarrier, which is exposed in Ryan > Newton's > > atomic-primops package. From the docs, I think that's a general read > > barrier, and should do what I want. Assuming it works properly, of > course. > > If I'm lucky it might even be optimized out. > > > > Thanks, > > John > > > > On Mon, Dec 30, 2013 at 6:04 AM, Edward Z. Yang <ezy...@mit.edu> wrote: > > > > > Hello John, > > > > > > Here are some prior discussions (which I will attempt to summarize > > > below): > > > > > > http://www.haskell.org/pipermail/haskell-cafe/2011-May/091878.html > > > > http://www.haskell.org/pipermail/haskell-prime/2006-April/001237.html > > > > http://www.haskell.org/pipermail/haskell-prime/2006-March/001079.html > > > > > > The guarantees that Haskell and GHC give in this area are hand-wavy at > > > best; at the moment, I don't think Haskell or GHC have a formal memory > > > model—this seems to be an open research problem. (Unfortunately, AFAICT > > > all the researchers working on relaxed memory models have their hands > > > full with things like C++ :-) > > > > > > If you want to go ahead and build something that /just/ works for a > > > /specific version/ of GHC, you will need to answer this question > > > separately for every phase of the compiler. For Core and STG, monads > > > will preserve ordering, so there is no trouble. However, for C--, we > > > will almost certainly apply optimizations which reorder reads (look at > > > CmmSink.hs). To properly support your algorithm, you will have to add > > > some new read barrier mach-ops, and teach the optimizer to respect > them. > > > (This could be fiendishly subtle; it might be better to give C-- a > > > memory model first.) These mach-ops would then translate into > > > appropriate arch-specific assembly or LLVM instructions, preserving > > > the guarantees further. > > > > > > This is not related to your original question, but the situation is a > > > bit better with regards to reordering stores: we have a WriteBarrier > > > MachOp, which in principle, prevents store reordering. In practice, we > > > don't seem to actually have any C-- optimizations that reorder stores. > > > So, at least you can assume these will work OK! > > > > > > Hope this helps (and is not too inaccurate), > > > Edward > > > > > > Excerpts from John Lato's message of 2013-12-20 09:36:11 +0800: > > > > Hello, > > > > > > > > I'm working on a lock-free algorithm that's meant to be used in a > > > > concurrent setting, and I've run into a possible issue. > > > > > > > > The crux of the matter is that a particular function needs to > perform the > > > > following: > > > > > > > > > x <- MVector.read vec ix > > > > > position <- readIORef posRef > > > > > > > > and the algorithm is only safe if these two reads are not reordered > (both > > > > the vector and IORef are written to by other threads). > > > > > > > > My concern is, according to standard Haskell semantics this should be > > > safe, > > > > as IO sequencing should guarantee that the reads happen in-order. Of > > > > course this also relies upon the architecture's memory model, but x86 > > > also > > > > guarantees that reads happen in order. However doubts remain; I do > not > > > > have confidence that the code generator will handle this properly. > In > > > > particular, LLVM may freely re-order loads of NotAtomic and Unordered > > > > values. > > > > > > > > The one hope I have is that ghc will preserve IO semantics through > the > > > > entire pipeline. This seems like it would be necessary for proper > > > handling > > > > of exceptions, for example. So, can anyone tell me if my worries are > > > > unfounded, or if there's any way to ensure the behavior I want? I > could > > > > change the readIORef to an atomicModifyIORef, which should issue an > > > mfence, > > > > but that seems a bit heavy-handed as just a read fence would be > > > sufficient > > > > (although even that seems more than necessary). > > > > > > > > Thanks, > > > > John L. > > > >
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users