Re: Feasibility of native RTS support for continuations?
As a small update on this for anyone following along, I submitted a GHC proposal about a week ago to add the discussed primops (albeit with some tweaked names). For those who haven’t seen it already, the pull request is here: https://github.com/ghc-proposals/ghc-proposals/pull/313 So far, the reception has been quite positive, so I’m optimistic about getting these added. Of course, if anyone has any concerns, please voice them in the PR thread! Thanks, Alexis ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
> On Feb 10, 2020, at 02:18, Simon Marlow wrote: > >> On Mon, 10 Feb 2020 at 08:17, Simon Marlow wrote: >> >> Let me just say "unsafePerformIO" :) You probably want to at least ensure >> that things don't crash in that case, even if you can't give a sensible >> semantics to what actually happens. We have a similar situation with >> unsafeIOToST - we can't tell you exactly what it does in general, except >> that it doesn't crash (I hope!). > > Typo - I meant unsafeIOToSTM here. I’ve been thinking about this. At first I figured you were probably right, and I decided I’d switch to a more raiseAsync-like approach. But once I started trying to implement it, I became less convinced it makes sense. As its name implies, an AP_STACK is sort of like a saturated application, where the stack itself is the “function.” Extending the analogy, a continuation would be a PAP_STACK, since the stack is awaiting a value. This difference is significant. Suppose you write: let x = 1 + unsafePerformIO (shift f >>= g) in ... If you force x, the stack will unwind to the nearest reset. When you unwind past x’s UPDATE_FRAME, you can’t replace the blackhole with an AP_STACK, since the captured slice of the stack represents the expression \m -> 1 + unsafePerformIO (m >>= g) which is a function, not a thunk. The only logical interpretation of this situation is that x is a thunk quite like let y = 1 + error "bang" except that it doesn’t just abort to the enclosing reset frame when forced, it actually composes the captured continuation with the upper frames of the current stack and passes the composed continuation to f to resume computation. That’s an awful lot of trouble given the resulting semantics is going to be unpredictable, anyway! I should be clear that I do not intend shift/reset to have any safe interface in IO directly. Even if you could make it type safe, it would break all kinds of code that manages resources using bracket. Rather, I have built a library that defines a totally separate Eff monad, and that monad uses the primops directly, wrapped in a safe interface. It isn’t possible for a user to screw things up using unsafePerformIO because there is no way to call shift from IO (unless you wrap shift# in IO yourself, but then I think you can be expected to know what you’re getting yourself into). So without mucking about with GHC.Exts, you still can’t get segfaults. Alexis ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
On Mon, 10 Feb 2020 at 08:17, Simon Marlow wrote: > On Mon, 10 Feb 2020 at 08:10, Alexis King wrote: > >> On Feb 6, 2020, at 02:28, Simon Marlow wrote: >> >> The issue here is that raiseAsync is destructive - it *moves* the stack >> to the heap, rather than copying it. So if you want to continue execution >> where you left off, for shift#, you would have to copy it back onto the >> stack again. That's the point I was trying to highlight here. >> >> >> Ah, yes, I see what you mean! It happens that for my use case I actually >> do want to unwind the stack when I capture a continuation, so that isn’t a >> problem for me. >> >> Yes, these are all the things that make raiseAsync tricky! You can either >> copy what raiseAsync does (but be warned, it has taken a lot of iteration >> to get right) or try to use raiseAsync and/or modify it to do what you want. >> >> >> My point was more that I’m unsure that shift# *should* handle most of >> those cases. For raiseAsync, it makes sense, since asynchronous interrupts >> can, by their nature, occur at any time, even during pure code. But my >> shift# operation lives in IO, and the intent is to only capture up to a >> reset# in the same state thread. >> >> My justification for this is that if you could use shift# in pure code, >> it would be ill-defined what you’d even be capturing. Suppose you return a >> thunk containing a call to shift#. When the thunk is evaluated, you capture >> up to the nearest reset#… but who knows what that is now? This opens you up >> to all sorts of general badness. >> >> Therefore, I don’t think there should ever be an UPDATE_FRAME in the >> captured continuation—if there is, it’s probably a bug. So unless someone >> can think of any valid use cases, I’ll make that more explicit by modifying >> the continuation-capturing code to add some assertions that those frames >> never appear in the captured stack. >> > > Let me just say "unsafePerformIO" :) You probably want to at least ensure > that things don't crash in that case, even if you can't give a sensible > semantics to what actually happens. We have a similar situation with > unsafeIOToST - we can't tell you exactly what it does in general, except > that it doesn't crash (I hope!). > Typo - I meant unsafeIOToSTM here. > > Cheers > Simon > ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
On Mon, 10 Feb 2020 at 08:10, Alexis King wrote: > On Feb 6, 2020, at 02:28, Simon Marlow wrote: > > The issue here is that raiseAsync is destructive - it *moves* the stack to > the heap, rather than copying it. So if you want to continue execution > where you left off, for shift#, you would have to copy it back onto the > stack again. That's the point I was trying to highlight here. > > > Ah, yes, I see what you mean! It happens that for my use case I actually > do want to unwind the stack when I capture a continuation, so that isn’t a > problem for me. > > Yes, these are all the things that make raiseAsync tricky! You can either > copy what raiseAsync does (but be warned, it has taken a lot of iteration > to get right) or try to use raiseAsync and/or modify it to do what you want. > > > My point was more that I’m unsure that shift# *should* handle most of > those cases. For raiseAsync, it makes sense, since asynchronous interrupts > can, by their nature, occur at any time, even during pure code. But my > shift# operation lives in IO, and the intent is to only capture up to a > reset# in the same state thread. > > My justification for this is that if you could use shift# in pure code, it > would be ill-defined what you’d even be capturing. Suppose you return a > thunk containing a call to shift#. When the thunk is evaluated, you capture > up to the nearest reset#… but who knows what that is now? This opens you up > to all sorts of general badness. > > Therefore, I don’t think there should ever be an UPDATE_FRAME in the > captured continuation—if there is, it’s probably a bug. So unless someone > can think of any valid use cases, I’ll make that more explicit by modifying > the continuation-capturing code to add some assertions that those frames > never appear in the captured stack. > Let me just say "unsafePerformIO" :) You probably want to at least ensure that things don't crash in that case, even if you can't give a sensible semantics to what actually happens. We have a similar situation with unsafeIOToST - we can't tell you exactly what it does in general, except that it doesn't crash (I hope!). Cheers Simon ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
> On Feb 6, 2020, at 02:28, Simon Marlow wrote: > > The issue here is that raiseAsync is destructive - it *moves* the stack to > the heap, rather than copying it. So if you want to continue execution where > you left off, for shift#, you would have to copy it back onto the stack > again. That's the point I was trying to highlight here. Ah, yes, I see what you mean! It happens that for my use case I actually do want to unwind the stack when I capture a continuation, so that isn’t a problem for me. > Yes, these are all the things that make raiseAsync tricky! You can either > copy what raiseAsync does (but be warned, it has taken a lot of iteration to > get right) or try to use raiseAsync and/or modify it to do what you want. My point was more that I’m unsure that shift# should handle most of those cases. For raiseAsync, it makes sense, since asynchronous interrupts can, by their nature, occur at any time, even during pure code. But my shift# operation lives in IO, and the intent is to only capture up to a reset# in the same state thread. My justification for this is that if you could use shift# in pure code, it would be ill-defined what you’d even be capturing. Suppose you return a thunk containing a call to shift#. When the thunk is evaluated, you capture up to the nearest reset#… but who knows what that is now? This opens you up to all sorts of general badness. Therefore, I don’t think there should ever be an UPDATE_FRAME in the captured continuation—if there is, it’s probably a bug. So unless someone can think of any valid use cases, I’ll make that more explicit by modifying the continuation-capturing code to add some assertions that those frames never appear in the captured stack.___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
On Sun, 2 Feb 2020 at 04:26, Alexis King wrote: > I took a stab at implementing this today, using the “continuation is a > stack” implementation strategy I described in my previous email. I > haven’t tried very hard to break it yet, but this tiny test program > works: > > {-# LANGUAGE BangPatterns, BlockArguments, MagicHash, > ScopedTypeVariables, UnboxedTuples #-} > > import GHC.Prim > import GHC.Types > > data Continuation a b = Continuation# (Continuation# RealWorld a b) > > reset :: IO a -> IO a > reset (IO m) = IO (reset# m) > > shift :: (Continuation a b -> IO b) -> IO a > shift f = IO (shift# \k -> let !(IO m) = f (Continuation# k) in m) > > applyContinuation :: Continuation a b -> a -> IO b > applyContinuation (Continuation# k) a = IO (applyContinuation# k a) > > main :: IO () > main = do > ns <- reset do > n <- shift \(k :: Continuation Integer [Integer]) -> do > a <- applyContinuation k 2 > b <- applyContinuation k 3 > pure (a ++ b) > pure [n] > print ns > > The output of this program is [2, 3], as expected. > That's impressive! > > My implementation doesn’t share any code with raiseAsync. Currently, it > isn’t very clever: > > * reset# pushes a RET_SMALL frame with a well-known info pointer, > _reset_frame_info. > > * shift# walks the stack and copies it up to the nearest reset > frame. If the stack consists of several chunks, it only copies the > chunk that contains the reset frame, and it just repurposes the > other chunks as the continuation (since the stack is unwinding > anyway). > > * applyContinuation# copies the captured stack and updates the > UNDERFLOW frames as needed to point to the current stack. > > * I haven’t implemented it yet, but it would be straightforward to > implement an applyContinuationOneShot# operation that works like > applyContinuation#, but doesn’t actually copy anything and just > updates the UNDERFLOW frames in the captured stack itself. > > This seems to work in my very simple examples, but there are also things > I know it doesn’t handle properly: > > * It doesn’t make any attempt to handle modifications to the > interrupt masking state properly. The right thing to do here is > probably to look for mask/unmask frames on the stack during > unwinding and to stash that information somewhere. > > * It doesn’t do anything special for UPDATE_FRAMEs, so if there’s an > UPDATE_FRAME that owns a blackhole on the stack, things will go > quite wrong. > > I haven’t been worrying about this because I don’t think there > should ever be any update frames between shift# and reset#. In the > case of raiseAsync, the location of the “prompt” is well-defined: > it’s the update frame. But shift# captures up to an explicit > prompt, so using shift# when there’s an update frame on the stack > can surely only lead to nonsense... right? > > * It doesn’t do anything special for STM frames, so trying to > capture a continuation through those will be similarly broken. > Yes, these are all the things that make raiseAsync tricky! You can either copy what raiseAsync does (but be warned, it has taken a lot of iteration to get right) or try to use raiseAsync and/or modify it to do what you want. Cheers Simon > There are also probably bugs I don’t know about — I haven’t exercised > the implementation very hard yet — but I’ll keep playing with it. If > anyone is at all interested, I’ve pushed the code to a branch here: > > > https://gitlab.haskell.org/lexi.lambda/ghc/compare/master...first-class-continuations > > My thanks again to everyone’s help! > > Alexis ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
On Sat, 1 Feb 2020 at 00:23, Alexis King wrote: > > On Jan 30, 2020, at 02:35, Simon Marlow wrote: > > > Also you might want to optimise the implementation so that it doesn't > actually tear down the stack as it copies it into the heap, so that you > could avoid the need to copy it back from the heap again in shift#. > > I don’t fully understand this point — do you mean “avoid the need to copy > it back on continuation application”? shift# just copies the stack slice to > the heap, so it doesn’t need to copy it back. > The issue here is that raiseAsync is destructive - it *moves* the stack to the heap, rather than copying it. So if you want to continue execution where you left off, for shift#, you would have to copy it back onto the stack again. That's the point I was trying to highlight here. Cheers Simon > If I was right, and you were referring to continuation application rather > than shift#, I agree with you there. It seems as though it ought to be > possible to represent the stack slice as a stack itself, so if the stack > looks like > > ┌───┐ > │ RET_SMALL │ > ├───┤ > │ CATCH │ > ├───┤ > │ RESET │ > ├───┤ > > then the captured continuation could itself essentially be a stack like > > ┌───┐ > │ RET_SMALL │ > ├───┤ > │ CATCH │ > ├───┤ > │ UNDERFLOW │ > └───┘ > > where restoring a continuation just copies the captured stack and updates > its underflow frame to point at the top of the current stack. If the caller > promises not to use the continuation again after applying it, the copying > could be skipped entirely, and the captured stack could just become the new > stack. > > However, I don’t understand enough about the way the RTS currently works > to know if this is viable. For example, what if I write this: > > reset (mask_ (shift f)) > > Now presumably I want to ensure the masking state is restored when I > invoke the continuation. But it can’t just be unconditionally restored, > since if I write > > mask_ (reset (shift f >>= g)) > > then the mask frame isn’t included on the stack, so applying the > continuation shouldn’t affect the masking state. Presumably this means a > continuation restore can’t be as simple as copying the captured stack > frames onto the current stack, since restoring certain frames affects other > parts of the RTS state. > > (Tangentially, it seems like this particular example is not handled > properly in the existing capture/restore code, as this comment in > Exception.cmm notes: > > NB. there's a bug in here. If a thread is inside an > unsafePerformIO, and inside maskAsyncExceptions# (there is an > unmaskAsyncExceptions_ret on the stack), and it is blocked in an > interruptible operation, and it receives an exception, then the > unsafePerformIO thunk will be updated with a stack object > containing the unmaskAsyncExceptions_ret frame. Later, when > someone else evaluates this thunk, the original masking state is > not restored. > > I think getting this right probably matters more if continuations are > added, so that’s something else to worry about.) > > > So that's shift#. What about reset#? I expect it's something like > `unsafeInterleaveIO`, that is it creates a thunk to name the continuation. > You probably also want a `catch` in there, so that we don't tear down more > of the stack than we need to. > > It would be nice to be able to capture slices of the stack that include > catch frames, though theoretically it isn’t necessary — it would be > possible to arrange for a continuation that wants to capture through a > catch to do something like > > reset (... (catch (reset ...) ...)) > > instead, then call shift twice and compose the two continuations by hand > (inserting another call to catch in between). I don’t know enough yet to > understand all the tradeoffs involved, but I’ll see if it’s possible to get > away with the userland emulation, since I figure the less code in the RTS > the better! > > > Hope this is helpful. > > Very much so, thank you! > > > On Jan 30, 2020, at 10:31, Ben Gamari wrote: > > > > For the record, runtime system captures the stack state in an AP_STACK > > closure. This is done in rts/RaiseAsync.c:raiseAsync and some of this is > > described in the comment attached to that function. > > > > As Simon PJ points out, this is all very tricky stuff, especially in a > > concurrent context. If you make any changes in this area do be sure to > > keep in mind the considerations described in Note [AP_STACKs must be > > eagerly blackholed], which arose out of the very-nast #13615. > > Thank you for the pointers — I did manage to find raiseAsync, but I hadn’t > seen that Note. I will try my best to be suitably wary, though I imagine > I’m in far enough over my head that I don’t yet know half the things I need > to be wary of. :) > > Alexis
Re: Feasibility of native RTS support for continuations?
I took a stab at implementing this today, using the “continuation is a stack” implementation strategy I described in my previous email. I haven’t tried very hard to break it yet, but this tiny test program works: {-# LANGUAGE BangPatterns, BlockArguments, MagicHash, ScopedTypeVariables, UnboxedTuples #-} import GHC.Prim import GHC.Types data Continuation a b = Continuation# (Continuation# RealWorld a b) reset :: IO a -> IO a reset (IO m) = IO (reset# m) shift :: (Continuation a b -> IO b) -> IO a shift f = IO (shift# \k -> let !(IO m) = f (Continuation# k) in m) applyContinuation :: Continuation a b -> a -> IO b applyContinuation (Continuation# k) a = IO (applyContinuation# k a) main :: IO () main = do ns <- reset do n <- shift \(k :: Continuation Integer [Integer]) -> do a <- applyContinuation k 2 b <- applyContinuation k 3 pure (a ++ b) pure [n] print ns The output of this program is [2, 3], as expected. My implementation doesn’t share any code with raiseAsync. Currently, it isn’t very clever: * reset# pushes a RET_SMALL frame with a well-known info pointer, _reset_frame_info. * shift# walks the stack and copies it up to the nearest reset frame. If the stack consists of several chunks, it only copies the chunk that contains the reset frame, and it just repurposes the other chunks as the continuation (since the stack is unwinding anyway). * applyContinuation# copies the captured stack and updates the UNDERFLOW frames as needed to point to the current stack. * I haven’t implemented it yet, but it would be straightforward to implement an applyContinuationOneShot# operation that works like applyContinuation#, but doesn’t actually copy anything and just updates the UNDERFLOW frames in the captured stack itself. This seems to work in my very simple examples, but there are also things I know it doesn’t handle properly: * It doesn’t make any attempt to handle modifications to the interrupt masking state properly. The right thing to do here is probably to look for mask/unmask frames on the stack during unwinding and to stash that information somewhere. * It doesn’t do anything special for UPDATE_FRAMEs, so if there’s an UPDATE_FRAME that owns a blackhole on the stack, things will go quite wrong. I haven’t been worrying about this because I don’t think there should ever be any update frames between shift# and reset#. In the case of raiseAsync, the location of the “prompt” is well-defined: it’s the update frame. But shift# captures up to an explicit prompt, so using shift# when there’s an update frame on the stack can surely only lead to nonsense... right? * It doesn’t do anything special for STM frames, so trying to capture a continuation through those will be similarly broken. There are also probably bugs I don’t know about — I haven’t exercised the implementation very hard yet — but I’ll keep playing with it. If anyone is at all interested, I’ve pushed the code to a branch here: https://gitlab.haskell.org/lexi.lambda/ghc/compare/master...first-class-continuations My thanks again to everyone’s help! Alexis ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
> On Jan 30, 2020, at 02:35, Simon Marlow wrote: > > My guess is you can almost do what you want with asynchronous exceptions but > some changes to the RTS would be needed. > > There's a bit of code in the IO library that literally looks like this > (https://gitlab.haskell.org/ghc/ghc/blob/master/libraries%2Fbase%2FGHC%2FIO%2FHandle%2FInternals.hs#L175) Thanks for the pointer, I definitely had not discovered that! That is an interesting trick. I think your explanation paired with the Note is enough for it to make sense to me, though I don’t yet understand all the implementation subtleties of raiseAsync itself. > Also you might want to optimise the implementation so that it doesn't > actually tear down the stack as it copies it into the heap, so that you could > avoid the need to copy it back from the heap again in shift#. I don’t fully understand this point — do you mean “avoid the need to copy it back on continuation application”? shift# just copies the stack slice to the heap, so it doesn’t need to copy it back. If I was right, and you were referring to continuation application rather than shift#, I agree with you there. It seems as though it ought to be possible to represent the stack slice as a stack itself, so if the stack looks like ┌───┐ │ RET_SMALL │ ├───┤ │ CATCH │ ├───┤ │ RESET │ ├───┤ then the captured continuation could itself essentially be a stack like ┌───┐ │ RET_SMALL │ ├───┤ │ CATCH │ ├───┤ │ UNDERFLOW │ └───┘ where restoring a continuation just copies the captured stack and updates its underflow frame to point at the top of the current stack. If the caller promises not to use the continuation again after applying it, the copying could be skipped entirely, and the captured stack could just become the new stack. However, I don’t understand enough about the way the RTS currently works to know if this is viable. For example, what if I write this: reset (mask_ (shift f)) Now presumably I want to ensure the masking state is restored when I invoke the continuation. But it can’t just be unconditionally restored, since if I write mask_ (reset (shift f >>= g)) then the mask frame isn’t included on the stack, so applying the continuation shouldn’t affect the masking state. Presumably this means a continuation restore can’t be as simple as copying the captured stack frames onto the current stack, since restoring certain frames affects other parts of the RTS state. (Tangentially, it seems like this particular example is not handled properly in the existing capture/restore code, as this comment in Exception.cmm notes: NB. there's a bug in here. If a thread is inside an unsafePerformIO, and inside maskAsyncExceptions# (there is an unmaskAsyncExceptions_ret on the stack), and it is blocked in an interruptible operation, and it receives an exception, then the unsafePerformIO thunk will be updated with a stack object containing the unmaskAsyncExceptions_ret frame. Later, when someone else evaluates this thunk, the original masking state is not restored. I think getting this right probably matters more if continuations are added, so that’s something else to worry about.) > So that's shift#. What about reset#? I expect it's something like > `unsafeInterleaveIO`, that is it creates a thunk to name the continuation. > You probably also want a `catch` in there, so that we don't tear down more of > the stack than we need to. It would be nice to be able to capture slices of the stack that include catch frames, though theoretically it isn’t necessary — it would be possible to arrange for a continuation that wants to capture through a catch to do something like reset (... (catch (reset ...) ...)) instead, then call shift twice and compose the two continuations by hand (inserting another call to catch in between). I don’t know enough yet to understand all the tradeoffs involved, but I’ll see if it’s possible to get away with the userland emulation, since I figure the less code in the RTS the better! > Hope this is helpful. Very much so, thank you! > On Jan 30, 2020, at 10:31, Ben Gamari wrote: > > For the record, runtime system captures the stack state in an AP_STACK > closure. This is done in rts/RaiseAsync.c:raiseAsync and some of this is > described in the comment attached to that function. > > As Simon PJ points out, this is all very tricky stuff, especially in a > concurrent context. If you make any changes in this area do be sure to > keep in mind the considerations described in Note [AP_STACKs must be > eagerly blackholed], which arose out of the very-nast #13615. Thank you for the pointers — I did manage to find raiseAsync, but I hadn’t seen that Note. I will try my best to be suitably wary, though I imagine I’m in far enough over my head that I don’t yet know
RE: Feasibility of native RTS support for continuations?
Simon Peyton Jones via ghc-devs writes: > | Now that is very interesting, and certainly not something I would have > | expected! Why would asynchronous exceptions need to capture any portion of > | the stack? Exceptions obviously trigger stack unwinding, so I assumed the > | “abort to the current prompt” part of my implementation would already > | exist, but not the “capture a slice of the stack” part. Could you say a > | little more about this, or point me to some relevant code? > > Suppose a thread happens to be evaluating a pure thunk for (factorial 200). > Then it gets an asynchronous exception from another thread. That asynch exn > is nothing to do with (factorial 200). So we could either > > A. revert the thunk to (factorial 200), abandoning all >the work done so far, or > B. capture the stack and attach it to the thunk, so that ie any other >thread enters that thunk, it'll just run that stack. > > Now (A) means that every thunk has to be revertible, which means keeping its > original free variables, which leads to space leaks. And extra work to avoid > losing any info you need for reversion. Extra work is painful; we want to > put all of the extra work on the asynch exn. > > So we do (B). > > See Section 8 of "Asynchronous exceptions in Haskell". > https://www.microsoft.com/en-us/research/publication/asynchronous-exceptions-haskell-3/ > > And "An implementation of resumable black holes" (Reid). > https://alastairreid.github.io/papers/IFL_98/ > > This stack-freezing stuff is definitely implemented. I'm not quite > sure where, but I'm cc'ing Simon Marlow who can point you at it. > For the record, runtime system captures the stack state in an AP_STACK closure. This is done in rts/RaiseAsync.c:raiseAsync and some of this is described in the comment attached to that function. As Simon PJ points out, this is all very tricky stuff, especially in a concurrent context. If you make any changes in this area do be sure to keep in mind the considerations described in Note [AP_STACKs must be eagerly blackholed], which arose out of the very-nast #13615. Cheers and good luck! - Ben signature.asc Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
My guess is you can almost do what you want with asynchronous exceptions but some changes to the RTS would be needed. There's a bit of code in the IO library that literally looks like this ( https://gitlab.haskell.org/ghc/ghc/blob/master/libraries%2Fbase%2FGHC%2FIO%2FHandle%2FInternals.hs#L175 ): t <- myThreadId throwTo t e ... carry on ... that is, it throws an exception to the current thread using throwTo, and then there is code to handle what happens if the enclosing thunk is evaluated after the exception has been thrown. That is, throwing an exception to the current thread is an IO operation that returns later! This only works with throwTo, not with throwIO, because throwIO is a *synchronous* exception that destructively tears down the stack. I suppose if you want to pass a value to the thread after resumption you could do it via an IORef. But the issue with this is that you can only apply the continuation once: GHC treats the captured continuation like a thunk, which means that after evaluating it, it will be updated with its value. But for your purposes you need to be able to apply it at least twice - once because we want to continue after shift#, and again when we apply the continuation later. Somehow the thunks we build this way would need to be marked non-updatable. Perhaps this could be done with a new primitive `throwToNonUpdatable` (hopefully with a better name) that creates non-updatable thunks. Also you might want to optimise the implementation so that it doesn't actually tear down the stack as it copies it into the heap, so that you could avoid the need to copy it back from the heap again in shift#. So that's shift#. What about reset#? I expect it's something like `unsafeInterleaveIO`, that is it creates a thunk to name the continuation. You probably also want a `catch` in there, so that we don't tear down more of the stack than we need to. Hope this is helpful. Cheers Simon On Thu, 30 Jan 2020 at 00:55, Alexis King wrote: > > On Jan 29, 2020, at 03:32, Simon Peyton Jones > wrote: > > > > Suppose a thread happens to be evaluating a pure thunk for (factorial > 200). […] This stack-freezing stuff is definitely implemented. > > That’s fascinating! I had no idea, but your explanation makes sense (as do > the papers you linked). That is definitely promising, as it seems like many > of the tricky cases may already be accounted for? I’ll see if I can follow > the Cmm code well enough to hunt down how it’s implemented. > > One other thing I have been thinking about: this is completely > incompatible with the state hack, isn’t it? That is not a showstopper, of > course—I do not intend to suggest that continuations be capturable in > ordinary IO—but it does mean I probably want a way to selectively opt out. > (But I’ll worry about that if I ever get that far.) > > Alexis ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Re: Feasibility of native RTS support for continuations?
> On Jan 29, 2020, at 03:32, Simon Peyton Jones wrote: > > Suppose a thread happens to be evaluating a pure thunk for (factorial 200). > […] This stack-freezing stuff is definitely implemented. That’s fascinating! I had no idea, but your explanation makes sense (as do the papers you linked). That is definitely promising, as it seems like many of the tricky cases may already be accounted for? I’ll see if I can follow the Cmm code well enough to hunt down how it’s implemented. One other thing I have been thinking about: this is completely incompatible with the state hack, isn’t it? That is not a showstopper, of course—I do not intend to suggest that continuations be capturable in ordinary IO—but it does mean I probably want a way to selectively opt out. (But I’ll worry about that if I ever get that far.) Alexis ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
RE: Feasibility of native RTS support for continuations?
| Now that is very interesting, and certainly not something I would have | expected! Why would asynchronous exceptions need to capture any portion of | the stack? Exceptions obviously trigger stack unwinding, so I assumed the | “abort to the current prompt” part of my implementation would already | exist, but not the “capture a slice of the stack” part. Could you say a | little more about this, or point me to some relevant code? Suppose a thread happens to be evaluating a pure thunk for (factorial 200). Then it gets an asynchronous exception from another thread. That asynch exn is nothing to do with (factorial 200). So we could either A. revert the thunk to (factorial 200), abandoning all the work done so far, or B. capture the stack and attach it to the thunk, so that ie any other thread enters that thunk, it'll just run that stack. Now (A) means that every thunk has to be revertible, which means keeping its original free variables, which leads to space leaks. And extra work to avoid losing any info you need for reversion. Extra work is painful; we want to put all of the extra work on the asynch exn. So we do (B). See Section 8 of "Asynchronous exceptions in Haskell". https://www.microsoft.com/en-us/research/publication/asynchronous-exceptions-haskell-3/ And "An implementation of resumable black holes" (Reid). https://alastairreid.github.io/papers/IFL_98/ This stack-freezing stuff is definitely implemented. I'm not quite sure where, but I'm cc'ing Simon Marlow who can point you at it. You need to be careful. Suppose a thread pushes a prompt, then later evaluates a thunk T1, which in turn evaluates a thunk T2. If you capture the stack down to the prompt, you MUST overwrite T1 and T2 with a resumable continuation capturing their portion of the stack, in case some other, unrelated thread needs their value. But as I say, all this is implemented. --- Keep us posted. It'd be good to have a design that accommodated some of the applications in the 'composable scheduler activations' paper too. Simon | -Original Message- | From: Alexis King | Sent: 28 January 2020 22:19 | To: Simon Peyton Jones | Cc: ghc-devs | Subject: Re: Feasibility of native RTS support for continuations? | | > On Jan 28, 2020, at 04:09, Simon Peyton Jones | wrote: | > | > I've thought about this quite a bit in the past, but got stalled for | lack of cycles to think about it more. But there's a paper or two | | Many thanks! I’d stumbled upon the 2007 paper, but I hadn’t seen the 2016 | one. In the case of the former, I had thought it probably wasn’t | enormously relevant, since the “continuations” appear to be fundamentally | one-shot. At first glance, that doesn’t seem to have changed in the JFP | article, but I haven’t really read it yet, so maybe I’m mistaken. I’ll | take a closer look. | | > On the effects front I think Daan Leijen is doing interesting stuff, | although I'm not very up to date: | > | https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.micr | osoft.com%2Fen- | us%2Fresearch%2Fpeople%2Fdaan%2Fpublications%2Fdata=02%7C01%7Csimonpj | %40microsoft.com%7C1f6ac242e0334d662c8c08d7a4401d95%7C72f988bf86f141af91ab | 2d7cd011db47%7C1%7C0%7C637158467589648051sdata=%2BPgJblk6y%2BjXRc5bA0 | gdzQEjrqgAQB6UYytdw7UtLQQ%3Dreserved=0 | | Indeed, I’ve read a handful of his papers while working on this! I didn’t | mention it in the original email, but I’ve also talked a little with | Matthew Flatt about efficient implementation of delimited control, and he | pointed me to a few papers a couple of months ago. One of those was “Final | Shift for call/cc: a Direct Implementation of Shift and Reset” by | Gasbichler and Sperber, which describes an approach closest to what I | currently have in mind to try to implement in the RTS. | | > One interesting dimension is whether or not the continuations you | capture are one-shot. If so, particularly efficient implementations are | possible. | | Quite so. One thing I’ve considered is that it’s possible to obtain much | of that efficiency even without requiring strict one-shot continuations if | you have a separate operation for restoring a continuation that guarantees | you won’t ever restore it again, sort of like the existing | unsafeThaw/unsafeFreeze operations. That is, you can essentially convert a | multi-shot continuation into a one-shot continuation and reap performance | benefits, even if you’ve already applied the continuation. | | This is a micro-optimization, though, so I’m not worrying too much about | it right now. | | > Also: much of the "capture stack chunk" stuff is *already* implemented, | because it is (I think) what happens when a thread receives an | asynchronous exception, and just abandon its evaluation of thunks that it | has started work on. | | Now that is ve
Re: Feasibility of native RTS support for continuations?
> On Jan 28, 2020, at 04:09, Simon Peyton Jones wrote: > > I've thought about this quite a bit in the past, but got stalled for lack of > cycles to think about it more. But there's a paper or two Many thanks! I’d stumbled upon the 2007 paper, but I hadn’t seen the 2016 one. In the case of the former, I had thought it probably wasn’t enormously relevant, since the “continuations” appear to be fundamentally one-shot. At first glance, that doesn’t seem to have changed in the JFP article, but I haven’t really read it yet, so maybe I’m mistaken. I’ll take a closer look. > On the effects front I think Daan Leijen is doing interesting stuff, although > I'm not very up to date: > https://www.microsoft.com/en-us/research/people/daan/publications/ Indeed, I’ve read a handful of his papers while working on this! I didn’t mention it in the original email, but I’ve also talked a little with Matthew Flatt about efficient implementation of delimited control, and he pointed me to a few papers a couple of months ago. One of those was “Final Shift for call/cc: a Direct Implementation of Shift and Reset” by Gasbichler and Sperber, which describes an approach closest to what I currently have in mind to try to implement in the RTS. > One interesting dimension is whether or not the continuations you capture are > one-shot. If so, particularly efficient implementations are possible. Quite so. One thing I’ve considered is that it’s possible to obtain much of that efficiency even without requiring strict one-shot continuations if you have a separate operation for restoring a continuation that guarantees you won’t ever restore it again, sort of like the existing unsafeThaw/unsafeFreeze operations. That is, you can essentially convert a multi-shot continuation into a one-shot continuation and reap performance benefits, even if you’ve already applied the continuation. This is a micro-optimization, though, so I’m not worrying too much about it right now. > Also: much of the "capture stack chunk" stuff is *already* implemented, > because it is (I think) what happens when a thread receives an asynchronous > exception, and just abandon its evaluation of thunks that it has started work > on. Now that is very interesting, and certainly not something I would have expected! Why would asynchronous exceptions need to capture any portion of the stack? Exceptions obviously trigger stack unwinding, so I assumed the “abort to the current prompt” part of my implementation would already exist, but not the “capture a slice of the stack” part. Could you say a little more about this, or point me to some relevant code? Thanks again, Alexis ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
RE: Feasibility of native RTS support for continuations?
Alexis I've thought about this quite a bit in the past, but got stalled for lack of cycles to think about it more. But there's a paper or two: https://www.microsoft.com/en-us/research/publication/composable-scheduler-activations-haskell/ On that link you can also see a link to an earlier, shorter, conference version (rejected ). Also this earlier (2007) work https://www.microsoft.com/en-us/research/publication/lightweight-concurrency-primitives-for-ghc/ On the effects front I think Daan Leijen is doing interesting stuff, although I'm not very up to date: https://www.microsoft.com/en-us/research/people/daan/publications/ One interesting dimension is whether or not the continuations you capture are one-shot. If so, particularly efficient implementations are possible. Also: much of the "capture stack chunk" stuff is *already* implemented, because it is (I think) what happens when a thread receives an asynchronous exception, and just abandon its evaluation of thunks that it has started work on. Simon | -Original Message- | From: ghc-devs On Behalf Of Alexis King | Sent: 28 January 2020 08:20 | To: ghc-devs | Subject: Feasibility of native RTS support for continuations? | | Hi all, | | tl;dr: I want to try to implement native support for capturing slices of | the RTS stack as a personal experiment; please tell me the obstacles I | am likely to run into. Much more context follows. | | --- | | I have been working on an implementation of an algebraic effect system | that uses unsafe primops to be as performant as possible. However, the | unavoidable need to CPS the entire program balloons heap allocation. | Consider the following definition: | | f a b = g a >>= \c -> h (b + c) | | Assume `g` and `h` are not inlined. If the monad used is IO, this will | be compiled efficiently: the result of `g` is returned on the stack, and | no closure needs to be allocated for the lambda. However, if the monad | supports capturing the continuation, the above definition must be CPS’d. | After inlining, we end up with | | f a b = \k -> let lvl = \c -> h (b + c) k in g a lvl | | which must allocate a closure on the heap. This is frustrating, as it | must happen for every call to a non-inlined monadic operation, even if | that operation never captures the continuation. In an algebraic effect | system, there are many shortcuts that avoid the need to capture the | continuation, and under my implementation, large swaths of code never do | so. I’ve managed to exploit that to get some savings, but I can’t escape | the need to allocate all these closures. | | This motivates my question: how difficult would it be to allow capturing | a portion of the RTS call stack directly? My requirements are fairly | minimal, as continuations go: | |1. Capturing a continuation is only legal from within a strict state | thread (i.e. IO or strict ST). | |2. The continuation is captured up to a prompt, which would be a new | kind of RTS stack frame. Prompts are not tagged, so there is only | ever exactly one prompt active at any time (which may be the root | prompt). | |3. Capturing a continuation is unsafe. The behavior of capturing a | continuation is undefined if the current prompt was not created by | the current state thread (and it is never legal to capture up to | the root prompt). | |4. Applying a continuation is unsafe. Captured continuations return | `Any`, and type safety is the caller’s obligation. | |5. Continuations are “functional,” which is to say applying them does | not trigger any additional stack unwinding. | | This minimal support for primitive continuation capturing would be | enough to support my efficient, safe delimited control implementation. | In my ignorant mind, implementing this ought to be as simple as defining | two new primops, | | reset# :: (State# s -> (# State# s, a #)) | -> State# s -> (# State# s, a #) | | shift# :: ((a -> State# s -> (# State# s, Any #)) | -> State# s -> (# State# s, Any #)) | -> State# s -> (# State# s, a #) | | where reset# pushes a new prompt frame and shift# captures a slice of | the RTS stack up to that frame and copies it into the heap. Restoring a | continuation would copy all the captured frames onto the end of the | current stack. Sounds simple enough! | | I would like to experiment with implementing something like this myself, | just to see if it would really work, but somehow I doubt it is actually | as simple as it sounds. Minor complications are fine, but what worries | me are major obstacles I haven’t found in my limited attempts to learn | about the RTS. | | So far, I’ve read the old “The New GHC/Hugs Runtime System” paper, which | still seems mostly accurate from a high level, though I imagine many | details have changed since then. I’ve