Re: [PHP-DEV] PHP True Async RFC

Larry Garfield Sun, 09 Mar 2025 21:45:53 -0700

On Sun, Mar 9, 2025, at 11:56 AM, Edmond Dantes wrote:

> *Let me summarize the current state for today:*
>
>  1. I am abandoning `startScheduler` and the idea of preserving 
> backward compatibility with `await_all` or anything else in that 
> category. The scheduler will be initialized implicitly, and this does 
> not concern user-land. Consequently, the `spawn function()` code will 
> work everywhere and always.
>
>  2. I will not base the implementation on `Fiber` (perhaps only on the 
> low-level part). Instead of `Fiber`, there will be a separate class. 
> There will be no changes to `Fiber` at all. This decision follows the 
> principle of Win32 COM/DCOM: old interfaces should never be changed. If 
> an old interface needs modification, it should be given a new name. 
> This should have been done from the start.
>
>  3. I am abandoning low-level objects in PHP-land (FiberHandle, 
> SocketHandle etc). Over time, no one has voted for them, which means 
> they are unnecessary. There might be a low-level interface for 
> compatibility with Revolt.
>
>  4.   It might be worth restricting microtasks in PHP-land and keeping 
> them only for C code. This would simplify the interface, but we need to 
> ensure that it doesn’t cause any issues.  
>
>
> The remaining question on the agenda: deciding which model to choose — 
> *parent-child* or the *Go-style model*.


As noted, I am in broad agreement with the previously linked article on 
"playpens" (even if I hate that name), that the "go style model" is too 
analogous to goto statements.

Basically, this is asking "so do we use gotos or for loops?"  For which the 
answer is, I hope obviously, for loops.

Offering both, frankly, undermines the whole point of having structured, 
predictable concurrency.  The entire goal of that is to be able to know if 
there's some stray fiber running off in the background somewhere still doing 
who knows what, manipulating shared data, keeping references to objects, and 
other nefarious things.  With a nursery, you don't have that problem... *but 
only if you remove goto*.  A language with both a for loop and an arbitrary 
goto statement gets basically no systemic benefit from having the for loop, 
because neither developers nor compilers get any guarantees of what will or 
won't happen.

Especially when, as demonstrated, the "this can run in the background and I 
don't care about the result" use case can be solved more elegantly with nested 
blocks and channels, and in a way that, in practice, would probably get 
subsumed into DI Containers eventually so most devs don't have to worry about 
it.

Of interesting note along similar lines would be Rust, and... PHP. 

Rust's whole thing is memory safety.  The language simply will not let you 
write memory-unsafe code, even if it means the code is a bit more verbose as a 
result.  In exchange for the borrow checker, you get enough memory guarantees 
to write extremely safe parallel code.  However, the designers acknowledge that 
occasionally you do need to turn off the checker and do something manually... 
in very edge-y cases in very small blocks set off with the keyword "unsafe".  
Viz, "I know what I'm doing is stupid, but trust me."  The discouragement of 
doing so is built into the language, and tooling, and culture.

PHP... has a goto operator.  It was added late, kind of as a joke, but it's 
there.  However, it is not a full goto.  It can only jump within the current 
function, and only "up" control structures.  It's basically a named break.  
While it only rarely has value, it's not al that harmful unless you do 
something really dumb with it.  And then it's only harmful within the scope of 
the function that uses it.  And, very very rarely, there's some 
micro-optimization to be had.  (cf, this classic: 
https://github.com/igorw/retry/issues/3).  But PHP has survived quite well for 
30 years without an arbitrary goto statement.

So if we start from a playpen-like, structured concurrency assumption, which 
(as demonstrated) gives us much more robust code that is easier to follow and 
still covers nearly all use cases, there's two questions to answer:

1. Is there still a need for an "unsafe {}" block or in-function goto 
equivalent?
2. If so, what would that look like?

I am not convinced of 1 yet, honestly.  But if it really is needed, we should 
be targeting the least-uncontrolled option possible to allow for those edge 
cases.  A quick-n-easy "I'mma violate the structured concurrency guarantees, 
k?" undermines the entire purpose of structured concurrency.

> During our discussion, everything seems to be converging on the idea 
> that the changes introduced by the RFC into `Fiber` would be better 
> moved to a separate class. This would reduce confusion between the old 
> and new solutions. That way, developers wouldn't wonder why `Fiber` and 
> coroutines behave differently—they are simply different classes.
> The new *Coroutine* class could have a different interface with new 
> logic. This sounds like an excellent solution.
>
> The interface could look like this:
>
>  • *`suspend`* (or another clear name) – a method that explicitly hands 
> over execution to the *Scheduler*.
>  • *`defer`* – a handler that is called when the coroutine completes.
>  • *`cancel`* – a method to cancel the coroutine.
>  • *`context`* – a property that stores the execution context.
>  • *`parent`* (public property or `getParent()` method) – returns the 
> parent coroutine.
> (*Just an example for now.*)
>
> The *Scheduler* would be activated automatically when a coroutine is 
> created. If the `index.php` script reaches the end, the interpreter 
> would wait for the *Scheduler* to finish its work under the hood.
>
> Do you like this approach?

That API is essentially what I was calling "AsyncContext" before.  I am 
flexible on the name, as long as it is descriptive and gives the user the right 
mental model. :-)  (I'm not sure if Coroutine would be the right name either, 
since in what I was describing it's the spawn command that starts a coroutine; 
the overall async scope is the container for several coroutines.)

But perhaps that is a sufficient "escape hatch"?  Spitballing again:

async $nursery { // Formerly AsyncContext
  // Runs at the end of this nursery scope
  $nursery->defer($fn);

  // This creates and starts a coroutine, in this scope.
  $future = $nursery->spawn($fn);

  // A short-hand for "spawn this coroutine, in whatever the nearest async 
nursery scope is.
  // aka, an alias for the above line, but doesn't require passing $nursery 
around.
  $future spawn $fn;

  // If you want.
  $future->cancel();

  // See below.
  $nursery->spawn(stuff(...));
} // This blocks until escape() finishes, too, because it was bound to this 
scope.

function stuff() {
  async $inner {
      // This is bound to the $inner scope; $inner cannot end
      // until this is complete.  This is by design.
      spawn $inner;

      // This spawns a new coroutine on the parent scope, if any.
      // If there isn't one, $inner->parent is null so it falls back
      // to the current scope.
      // One could technically climb the entire tree to the top-most
      // scope and spawn a coroutine there.  It would be a bit annoying to do,
      // but, as noted, that's a good thing, because you shouldn't be doing 
that 99.9% of the time!
      // Channels are better 99.9% of the time.
      ($inner->parent ?? $inner)->spawn(escape(...));
    }
}

I'm not sure I fully like the above.  I don't know if it makes the guarantees 
too weak still.  But it does offer a limited, partial escape hatch, so may be 
an acceptable compromise.

It would be valuable to take this idea (or whatever we end up with) to experts 
in other languages with better async models than JS, and maybe a few academics, 
to let them poke obvious-to-them holes in it.

Edmund, does that make any sense to you?

--Larry Garfield

Re: [PHP-DEV] PHP True Async RFC

Reply via email to