Re: [PHP-DEV] PHP True Async RFC

Rob Landers Sun, 02 Mar 2025 06:10:59 -0800


On Sat, Mar 1, 2025, at 10:11, Edmond Dantes wrote:
> Good day, everyone. I hope you're doing well.
> 
> I’d like to introduce a draft version of the RFC for the True Async component.
> 
> https://wiki.php.net/rfc/true_async
> 
> I believe this version is not perfect and requires analysis. And I strongly 
> believe that things like this shouldn't be developed in isolation. So, if you 
> think any important (or even minor) aspects have been overlooked, please 
> bring them to attention.
> 
> The draft status also highlights the fact that it includes doubts about the 
> implementation and criticism. The main global issue I see is the lack of 
> "future experience" regarding how this API will be used—another reason to 
> bring it up for public discussion.
> 
> Wishing you all a great day, and thank you for your feedback!
>


Hey Edmond:

I find this feature quite exciting! I've got some feedback so far, though most 
of it is for clarification or potential optimizations:

> A PHP developer *SHOULD NOT* make any assumptions about the order in which 
> Fibers will be executed, as this order may change or be too complex to 
> predict.

There should be a defined ordering (or at least, some guarantees). Being able 
to understand what things run in what order can help with understanding a 
complex system. Even if it is just a vague notion (user tasks are processed 
before events, or vice versa), it would still give developers more confidence 
in the code they write. You actually mention a bit of the order later 
(microtasks happen before fibers/events), so this sentence maybe doesn't make 
complete sense.

Personally, I feel as though an async task should run as though it were a 
function call until it hits a suspension. This is mostly an optimization though 
(C# does this), but it could potentially reduce overhead of queueing a function 
that may never suspend (which you mention as a potential problem much later on):

Async\run(*function*() {
 
   $fiber = Async\async(*function*() {
       sleep <http://www.php.net/sleep>(1); // this gets enqueued now
       return "Fiber completed!";
   });
 
   *// Execution is paused until the fiber completes*
   $result = Async\await($fiber); // immediately enter $fiber without queuing
 
   echo $result . "*\n*";
 
   echo "Done!*\n*";
});

> Until it is activated, PHP code behaves as before: calls to blocking 
> functions will block the execution thread and will not switch the *Fiber* 
> context. Thus, code written without the *Scheduler* component will function 
> exactly the same way, without side effects. This ensures backward 
> compatibility.

I'm not sure I understand this. Won't php code behave exactly the same as it 
did before once enabling the scheduler? Will libraries written before this 
feature existed suddenly behave differently? Do we need to worry about the 
color of functions because it changes the behavior?

> `True Async` prohibits initializing the `Scheduler` twice.

How will a library take advantage of this feature if it cannot be certain the 
scheduler is running or not? Do I need to write a library for async and another 
version for non-async? Or do all the async functions with this feature work 
without the scheduler running, or do they throw a catchable error?

> This is crucial because the process may handle an OS signal that imposes a 
> time limit on execution (for example, as Windows does).

Will this change the way os signals are handled then? Will it break 
compatibility if a library uses pcntl traps and I'm using true async traps too? 
Note there are several different ways (timeout) signals are handled in PHP -- 
so if (per-chance) the scheduler could always be running, maybe we can unify 
the way signals are handled in php.

> Code that uses *Resume* cannot rely on when exactly the *Fiber* will resume 
> execution.

What if it never resumes at all? Will it call a finally block if it is 
try/catched or will execution just be abandoned? Is there some way to ensure 
cleanup of resources? It should probably mention this case and how abandoning 
execution works.

> If an exception is thrown inside a fiber and not handled, it will stop the 
> Scheduler and be thrown at the point where `Async\launchScheduler()` is 
> called.

The RFC doesn't mention the stack trace. Will it throw away any information 
about the inner exception?

> The *Graceful Shutdown* mode can also be triggered using the function:

What will calling `exit` or `die` do?

> A concurrent runtime allows handling requests using Fibers, where each Fiber 
> can process its own request. In this case, storing request-associated data in 
> global variables is no longer an option.

Why is this the case? Furthermore, if it inherits from the fiber that started 
its current fiber, won't using Resume/Notifier potentially cause problems when 
used manually? There are examples over the RFC using global variables in 
closures; so do these examples not actually work? Will sharing instances of 
objects in scope of the functions break things? For example:

Async\run($obj->method1(...));
Async\run($obj->method2(...));

This is technically sharing global variables (well, global to that scope -- 
global is just a scope after all) -- so what happens here? Would it make sense 
to delegate this fiber-local storage to user-land libraries instead?

> Objects of the `Future` class are high-level patterns for handling deferred 
> results. 

By this point we have covered FiberHandle, Resume, and Contexts. Now we have 
Futures? Can we simplify this to just Futures? Why do we need all these 
different ways to handle execution?

> A channel is a primitive for message exchange between `Fibers`.

Why is there an `isEmpty` and `isNotEmpty` function? Wouldn't 
`!$channel->isEmpty()` suffice?

It's also not clear what the value of most of these function is. For example:

if ($chan->isFull()) {
  doSomething(); // suspends at some point inside? We may not know when we 
write the code.
  // chan is no longer full, or maybe it is -- who knows, but the original 
assumption entering this branch is no longer true.
  ...
}

Whether a channel is full or not is not really important, and if you rely on 
that information, this is usually an architectural smell (at least in other 
languages). Same thing with empty or writable, or many others of these 
functions. You basically just write to a channel and eventually (or not, which 
is a bug and causes a deadlock) something will read it. The entire point is to 
use channels to decouple async code, but most of the functions here allow for 
code to become strongly coupled.

As for the single producer method, I am not sure why you would use this. I can 
see some upside for the built-in constraints (potentially in a dev-mode 
environment) but in a production system, single-producer bottlenecks are a real 
thing that can cause serious performance issues. This is usually something you 
explicitly want to avoid.

> In addition to the `send/receive` methods, which suspend the execution of a 
> `Fiber`, the channel also provides non-blocking methods: `trySend`, 
> `tryReceive`, and auxiliary explicit blocking methods: `waitUntilWritable` 
> and `waitUntilReadable`. 

It isn't clear what happens when `trySend` fails. Is this an error or does 
nothing? 

Thinking through it, there may be cases where `trySend` is valid, but more 
often than not, it is probably an antipattern. I cannot think of a valid reason 
for `tryReceive` and it's usage is most likely guaranteed to cause a deadlock 
in real code. For true multi-threaded applications, it makes more sense, but 
not for single-threaded concurrency like this.

In other words, the following code is likely to be more robust, and not depend 
on execution order (which we are told at the beginning not to do):

Async\run(*function*() {
    $channel = *new* Async\Channel();
 
    $reader = Async\async(*function*() *use*($channel) {
        while ($data = $channel->read() && $data !== NULL) {
            echo "receive: *$data**\n*";
        }
    });
 
    for ($i = 0; $i < 4; $i++) {
        echo "send: event data *$i**\n*";
        $data = $channel->send("event data *$i*");
    }
    
    $reader->cancel(); // clean up our reader
    // or
    $channel->close(); // will receive NULL I believe?
});

A `trySend` is still useful when you want to send a message but don't want to 
block if it is full. However, this is going to largely depend on how long is 
has been since the developer last suspended the current fiber, and nothing else 
-- thus it is probably an antipattern since it totally depends on the literal 
structure of the code, not the structure of the program -- if that makes sense.

> This means that `trapSignal` is not intended for “regular code” and should 
> not be used “anywhere”.

Can you expand on what this means in the RFC? Why expose it if it shouldn't be 
used?

-----

I didn't go into the low level api details yet -- this email is already pretty 
long. But I would suggest maybe thinking about how to unify 
Notifiers/Resume/FiberHandle/Future into a single thing. These things are 
pretty similar to one another (from a developer's standpoint) -- a way to 
continue execution, and they all offer a slightly different api.

I also noticed that you seem to be relying heavily on the current 
implementation to define behavior. Ideally, the RFC should define behavior and 
the implementation implement that behavior as described in the RFC. In other 
words, the RFC is used as a reference point as to whether something is a bug or 
an enhancement in the future. There has been more than once where the list 
looks back at an old RFC to try and determine the intent for discovering if 
something is working as intended or a bug. RFCs are also used to write 
documentation, so the more detailed the RFC, the better the documentation will 
be for new users of PHP.

— Rob

Re: [PHP-DEV] PHP True Async RFC

Reply via email to