I wrote a lazy ST microbenchmark (http://lpaste.net/351799) that uses nothing but lazy ST monad operations in the inner loop. With various caveats, it took around 3 times as long to run under +RTS -N2 after applying https://phabricator.haskell.org/D3038. The biggest caveat is that the cost of the `threadPaused` in `noDuplicate#` seems to be potentially proportional to the thread's stack depth, and I'm not sure how representative my microbenchmark is in that regard.
I'm actually surprised the `noDuplicate#` version isn't an order of magnitude or so slower than that. Still, a 3x factor is a large price to pay. I don't yet understand what's going on here clearly enough to be sure that the `noDuplicate#` is necessary, or that we can't implement `noDuplicate#` more cheaply in the common case of no contention. My feeling is that if it turns out that we can implement the correct behavior cheaply, then it will be better to have left it broken for a little while than to first have a correct but slow implementation and then later replaced it with a correct and fast implementation. The latter is disruptive to two groups of people, those who are affected by the bug and also those who cannot afford to have their lazy ST code run 3 times slower; of which the former group is affected already, and we can advertise the existence of the bug until we have a workable solution. So I'm reluctant to go down this `noDuplicate#` path until we have exhausted our other options. In an ideal world with no users, it would be better to start with correct but slow, of course. Regards, Reid Barton On Mon, Jan 30, 2017 at 11:18 AM, David Feuer <da...@well-typed.com> wrote: > I forgot to CC ghc-devs the first time, so here's another copy. > > I was working on #11760 this weekend, which has to do with concurrency > breaking lazy ST. I came up with what I thought was a pretty decent solution ( > https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is quite > unhappy about the idea of sticking this weird unsafePerformIO-like code > (noDup, which I originally implemented as (unsafePerformIO . evaluate), but > which he finds ugly regardless of the details) into fmap and (>>=). He's also > concerned that the noDuplicate# applications will kill performance in the > multi-threaded case, and suggests he would rather leave lazy ST broken, or > even remove it altogether, than use a fix that will make it slow sometimes, > particularly since there haven't been a lot of reports of problems in the > wild. > > My view is that leaving it broken, even if it only causes trouble > occasionally, is simply not an option. If users can't rely on it to always > give correct answers, then it's effectively useless. And for the sake of > backwards compatibility, I think it's a lot better to keep it around, even if > it runs slowly multithreaded, than to remove it altogether. > > Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has > always been a bit of deep magic. You can't *really* carry a moment of time > around in your pocket and make its history happen only if necessary. We can > make it work in GHC because its execution model is entirely based around graph > reduction, so evaluation is capable of driving execution. Whereas lazy IO is > extremely tricky because it causes effects observable in the real world, lazy > ST is only *moderately* tricky, causing effects that we have to make sure > don't lead to weird interactions between threads. I don't think it's terribly > surprising that it needs to do a few more weird things to work properly. > > David > _______________________________________________ > ghc-devs mailing list > ghc-devs@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs