On 30 January 2017 at 22:56, Simon Peyton Jones <simo...@microsoft.com> wrote:
> We don’t want to do this on a per-module basis do we, as > -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no? > Which thunks, precisely? I think perhaps *precisely thunks one of whose > free variables has type (Sttate# s) for some s.* These are thunks that > consume a state token, and must do so no more than once. > If we could identify exactly the thunks we wanted to be atomic, then yes, that would be better than a whole-module solution. However I'm not sure how to do that - doing it on the basis of a free variable with State# type doesn't work if the State# is buried in a data structure or a function closure, for instance. > If entering such thunks was atomic, could we kill off noDuplicate#? > > > > I still don’t understand exactly what noDuplicate# does, what problem it > solves, and how the problem it solves relates to this LazyST problem. > > Back in our "Haskell on a Shared Memory Multiprocessor" paper ( http://simonmar.github.io/bib/papers/multiproc.pdf) we described a scheme to try to avoid duplication of work when multiple cores evaluate the same thunk. This is normally applied lazily, because it involves walking the stack and atomically black-holing thunks pointed to by update frames. The noDuplicate# primop just invokes the stack walk immediately; the idea is to try to prevent multiple threads from evaluating a thunk containing unsafePerformIO. It's expensive. It's also not foolproof, because if you already happened to create two copies of the unsafePerformIO thunk then noDuplicate# can't help. I've never really liked it for these reasons, but I don't know a better way. We have unsafeDupablePerformIO that doesn't call noDuplicate#, and the programmer can use when the unsafePerformIO can safely be executed multiple times. > > > We need some kind of fix for 8.2. Simon what do you suggest? > David's current fix would be OK (along with a clear notice in the release notes etc. to note that the implementation got slower). I think -fatomic-eager-blackholing might "fix" it with less overhead, though. Ben's suggestion: > eagerlyBlackhole :: a -> a is likely to be unreliable I think. We lack the control in the source language to tie it to a particular thunk. Cheers Simon > > Simon > > > > *From:* Simon Marlow [mailto:marlo...@gmail.com] > *Sent:* 30 January 2017 21:51 > *To:* David Feuer <da...@well-typed.com> > *Cc:* Simon Peyton Jones <simo...@microsoft.com>; ghc-devs@haskell.org > *Subject:* Re: Lazy ST vs concurrency > > > > On 30 January 2017 at 16:18, David Feuer <da...@well-typed.com> wrote: > > I forgot to CC ghc-devs the first time, so here's another copy. > > > I was working on #11760 this weekend, which has to do with concurrency > breaking lazy ST. I came up with what I thought was a pretty decent > solution ( > https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is > quite > unhappy about the idea of sticking this weird unsafePerformIO-like code > (noDup, which I originally implemented as (unsafePerformIO . evaluate), but > which he finds ugly regardless of the details) into fmap and (>>=). He's > also > concerned that the noDuplicate# applications will kill performance in the > multi-threaded case, and suggests he would rather leave lazy ST broken, or > even remove it altogether, than use a fix that will make it slow sometimes, > particularly since there haven't been a lot of reports of problems in the > wild. > > > > In a nutshell, I think we have to fix this despite the cost - the > implementation is incorrect and unsafe. > > > > Unfortunately the mechanisms we have right now to fix it aren't ideal - > noDuplicate# is a bigger hammer than we need. All we really need is some > way to make a thunk atomic, it would require some special entry code to the > thunk which did atomic eager-blackholing. Hmm, now that I think about it, > perhaps we could just have a flag, -fatomic-eager-blackholing. We already > do this for CAFs, incidentally. The idea is to compare-and-swap the > blackhole info pointer into the thunk, and if we didn't win the race, just > re-enter the thunk (which is now a blackhole). We already have the cmpxchg > MachOp, so It shouldn't be more than a few lines in the code generator to > implement it. It would be too expensive to do by default, but doing it > just for Control.Monad.ST.Lazy should be ok and would fix the unsafety. > > > > (I haven't really thought this through, just an idea off the top of my > head, so there could well be something I'm overlooking here...) > > > > Cheers > > Simon > > > > > > My view is that leaving it broken, even if it only causes trouble > occasionally, is simply not an option. If users can't rely on it to always > give correct answers, then it's effectively useless. And for the sake of > backwards compatibility, I think it's a lot better to keep it around, even > if > it runs slowly multithreaded, than to remove it altogether. > > Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST > has > always been a bit of deep magic. You can't *really* carry a moment of time > around in your pocket and make its history happen only if necessary. We can > make it work in GHC because its execution model is entirely based around > graph > reduction, so evaluation is capable of driving execution. Whereas lazy IO > is > extremely tricky because it causes effects observable in the real world, > lazy > ST is only *moderately* tricky, causing effects that we have to make sure > don't lead to weird interactions between threads. I don't think it's > terribly > surprising that it needs to do a few more weird things to work properly. > > David > > >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs