Re: MVar semantics: proposal

Jan-Willem Maessen Tue, 11 Apr 2006 15:58:40 -0700

Sorry for the long delay in responding to this message---this issuetakes all the brain cells I've got in one go.

Ordinarily I'd trim the forgoing discussion, but it was rusty enoughthat I've retained it:


On Apr 4, 2006, at 7:12 AM, Simon Marlow wrote:

Jan-Willem - thanks for your thoughts on this, it's greatlyappreciated.


On 31 March 2006 18:49, Jan-Willem Maessen wrote:

John -

You are, in effect, proposing a memory model for MVars and IORefs.
The high-level model for programmers is "In order to communicate data
between threads, you *must* use an MVar, and never an IORef."

But the devil is in the details.  I'd like to strongly urge *against*
adopting the extremely loose model you have proposed.  The following
things seem particularly important:

* reads and writes to IORefs should be atomic, meaning either a
complete update is observed or no change is observed.  In the absence
of this guarantee, misuse of IORefs can cause programs to crash in
unrepeatable ways.  If the machine doesn't make this easy, the
implementor ought to sweat a little so that Haskell programmers don't
have to sweat at all.

* I assume forkIO constitutes a sequence point.  I suspect throwTo et
al ought to as well.

* I would urge that atomicModifyIORef constitute a sequence point---I
suspect it loses a great deal of its utility otherwise.

Now, on to more difficult issues...  Consider the following example
(untested):

data RefList a = Nil | Cons a (IORef (RefList a))

cons :: a -> RefList a -> IO (RefList a)
cons x xs = do
   a <- newIORef xs
   return (Cons x a)

hd :: RefList a -> a
hd (Cons a _) = a

tl :: RefList a -> IO (RefList a)
tl (Cons a t) = readIORef a

setTl :: RefList a -> RefList a -> IO ()
setTl (Cons a t) t' = writeIORef t t'

main = do a <- cons 'a' Nil
           forkIO $ do
             c <- cons 'c' Nil
             b <- cons 'b' Nil
            setTl b c
             setTl a b
           at <- tl a
           case at of
             Nil -> return ()
             Cons _ _ -> do
              putChar (hd at)
               att <- tl at

This program is, by your informal model, buggy.  The question is
this: how badly wrong is it?
Let's say at happens to read b.  Is (hd at) well defined?  That's
assuming very strong consistency from the memory system already.  How
about the IORef in at?  Is that fully allocated, and properly
initialized?  Again, if it is, that implies some pretty strong
consistency from the memory system.

Now, what about att?  By your argument, it may or may not be c.  We
can ask the same questions about its contents assuming it happens to
be c.

People have talked a lot about weakly-ordered NUMA machines for more
than a decade, and they're always just a couple of years away.  In
practical terms, non-atomic NUMA memory models tend to be so hard to
program that these machines have never found any traction---you need
to throw away all of your software, including your OS, and start
afresh with programmers that are vastly more skilled than the ones
who wrote the stuff you've already got.

My feeling is that the purely-functional portion of the Haskell
language already makes pretty stringent demands of memory
consistency.


This is true - in GHC we are required to add a memory barrier to thunk

update on architectures that don't have strong memory ordering,just to

ensure that when you follow the pointer in an indirection you can
actually see the value at the end of the pointer.

Since x86 & x86_64 can implement strong memory ordering without

(seemingly) too much overhead, surely adding the barrierinstruction forother architectures shouldn't impose too much of a penalty, atleast in

theory?

Interesting question. The currently-popular architectures can get bywithout too many memory barriers, in large part by requiring storesto commit to memory in order; my belief is that SPARC TSO can get bywith no memory barriers for thunk update/read, and that PowerPCrequires a write barrier (and perhaps read barriers).

It remains to be seen whether multi-core pipelines will change thisequation; there are reasons an architect might prefer to use a singlestore pipeline for multiple threads, satisfying loads from one threadfrom pending stores for another thread. The practical upshot wouldbe weaker memory models all around.

Sadly, x86 has a bad record of bungling synchronization operations,and clear documentation on the x86 memory model is conspicuous by itsabsence.

In light of those demands, and the fact that mutable
state is used in pretty tightly-controlled ways, it's worth
considering much stronger memory models than the one you propose.
I'd even go so far as to say "IORefs and IOArrays are sequentially
consistent".


Certainly possible; again on x86 & x86_64 it's a no-op, on other
architectures it means adding a barrier to writeIORef.  In GHC we're
already doing a write barrier (of the generational GC kind, not the
microprocessor kind) in writeIORef anyway.

It is certainly my hope that the memory barriers required bywriteIORef and company will be no worse than those required by thunkupdate---ie, writeIORef should cost about as much as updating theheader word of a thunk.

The only argument against this behavior is their use in
the internals of arrays, file I/O, the FFI, etc., etc. (though really
it's all about IOUArrays in the latter cases) where we might
conceivably pay a bundle in performance.

Another possibility is an algebraic model based on commuting IO
actions.  That approach is a particular bias of mine, having tangled
with these issues extensively in the past.  It'd go something like
   this: * Any data written to an IORef can safely be read by another
thread; we cannot observe
       partially-written objects.
   * readIORef commutes with readIORef.
   * newIORef commutes with newIORef.
   * writeIORef and newIORef commute with writeIORef or readIORef to
a different IORef.
   * Nothing commutes with readMVar, writeMVar, or atomicModifyIORef.
   * Nothing before a forkIO can be commuted to after forkIO.


Does this model mean anything to the runtime, or would it just affect
compile-time optimisations?

On weakly ordered machines, it tells us where we must insert memorybarriers, and what sort of memory barriers are required. Inpractice, the easiest expedient is to bake those barriers in toeither the read or write operations. On some machines, theseoperations are no-ops and get erased. On others, one can usecommutativity to hoist barriers, then take advantage of the fact that:


barrier >> barrier === barrier

I imagine that, since the runtime still has to use barriers to prevent

partially-written objects from being visible to other threads, ineffect

the runtime would end up providing full serialisation anyway.  But my
tiny brain hasn't quite the capacity to think this through completely
right now, I'm hoping someone else has.

That is my belief as well. But the synchronization may not be as"full" as you imagine---a good thing for the implementor, but apotentially surprising thing for the programmer.

My particular concern here is actually array construction.Conceptually, we'd like to avoid a barrier operation every time wewrite a pointer into an IOArray or an STArray (the barrier ensuresthat the data being stored in the array is properly formatted inmemory before the update occurs). I suspect this can be avoided, butit's tricky to come up with a graceful way to guarantee it ingeneral. In the worst case, this would make constructing any kind ofarray slow and clunky. That's an eventuality to think about, andavoid. It might mean re-coding bulk updates to use something moreprimitive (and unsafe) than writeIOArray / writeSTArray.

I think it's a Good Idea to choose a model that is conceptually
simple now, at the cost of imposing a few constraints on
implementors, rather than a complex specification which permits
maximum implementation flexibility but is utterly opaque.


I don't have a strong opinion, since as I said earlier the constraints
aren't that onerous in practice.

However, I don't completely understand why the more flexible modelwould

be "complex" and "opaque".  Isn't it just a case of specifying certain

interactions as resulting in undefined behaviour? Or do you thinkit's

too hard to specify exactly which interactions are undefined?

My assumption here is that bad synchronization should not causeHaskell to crash with a seg fault (because we looked at something ina partially-formed state). Getting a loose spec which stillguarantees this is tricky in practice. The other tricky part isexplaining to programmers why their code went wrong in someunintuitive way. Here, saying "use MVars / STRefs to synchronize"should go a long way.


-Jan


Cheers,
        Simon


_______________________________________________
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime

Re: MVar semantics: proposal

Reply via email to