Hi all.
Reposting here what I already posted
(https://github.com/true-async/php-true-async-rfc/discussions/8#discussioncomment-15074303)
in the discussion on php-true-async-rfc (the discussion should really
move there from this list IMO).
Senior software artisan here with 15 years of PHP experience, 8 years of
async PHP experience (amphp v2, then v3), 5 years of go experience, 3
years of Rust experience, and much more.
My recommendation for async PHP is:
- Stackful coroutines
- Colorless functions
- Existing IO functions should be async
- Some memory isolation: shared statics maybe with a separate attribute
(i.e. allow fiber-local statics which have uses, but also allow normal
statics as is already the case to allow for caching); while getting rid
of globals (not statics) in general would be nice, I feel like it would
be out of scope for this RFC.
Race conditions are **not** an issue, like they are not an issue in
the vast majority of languages, as there is a variety of tools and
models to work with them: channels (go), mutexes (go, amphp, rust, C,
C++, etc), actor model (any language), etc.
This model matches the status quo in golang and amphp v3: from my 8
years of experience in writing async both **business logic** and
**abstraction logic** in multiple languages, this is the best
concurrency model.
Some evidence:
I maintain [MadelineProto](https://github.com/danog/MadelineProto/), the
biggest PHP MTProto client.
MTProto is an async binary protocol over TCP (no HTTP involved),
requires heavy caching in order for clients to function correctly, and a
lot race-heavy abstractions.
MadelineProto is a framework which contains a fully async MTProto client.
I migrated MadelineProto to async PHP with amphp v2 (colored, stackless
async) in 2018, and then to amphp v3 (colorless, stackful async) in 2022.
- The biggest problem while using v2 was the colored approach, requiring
the use of await (yield, using await for clarity from here on) every
time a function may become async: this is the status quo of some
languages like JS, but in reality **it becomes a giant pain when writing
a lot of business logic**: not only that, there were constant **breaking
changes** every time a method that previously wasn't async (i.e. had no
network/IO logic) suddenly becomes async (thus requiring the use of await).
In MadelineProto, I worked around this with a custom coroutine
runtime (replacing that of amphp v2), which allowed users to `await`
even non-async functions: by then forcing users to always await all
methods of the framework, I managed to avoid breaking changes every time
an abstraction became async.
Clearly, this was a crutch, made to workaround a big usability issue
for end users, caused by colored functions.
Another issue was with the stackless approach, which essentially
forced to use of `call` to spawn a new coroutine (creating a new
generator) every time when invoking an async function: I again worked
around this in my custom coroutine runtime, allowing to spawn a new
coroutine simply by using `await` (`yield`).
This in turn led to the creation of a new generator object every time
`await` was used: I was also able to work around this, transforming
every `yield` to a `yield from`: this reduced the overhead, but it was
clearly yet another crutch, made to transform a stackless approach into
a more usable stackful approach.
- The switch to amphp v3 finally got rid of the colored approach,
switching to a stackful, colorless approach.
This was a huge improvement in terms of developer UX: no more
worrying about when to use `await`, no more custom coroutine runtimes to
implement stackfulness to avoid using `call()` *and* `await` every time
when calling an async function, all **without impacting safety**.
Amphp v3's model is pretty close to golang's model, and mirrors go's
ease of use.
- When migrating the MadelineProto framework to amphp, right away, it
was clear to me that with concurrency, race conditions would be an issue.
Amphp v2 already offered synchronization primitives in the form of
mutexes, which are more than enough to guarantee safety.
However, I wanted an easier approach for me as the library
developer, which is why I, right away, chose to communicate instead of
sharing memory, by adopting the [actor
model](https://en.wikipedia.org/wiki/Actor_model) for MadelineProto's
internal core modules (IO, update handling, any moderately large logic
requiring synchronization).
Mutexes are still used for isolated, smaller logic, as they provide
equivalent safety with fewer boilerplate compared to channels/actors.
**A shared-nothing approach only adds needless complexity and
overhead, when the alternative is simply to isolate logic needing
synchronization in a separate actor, or adding a few mutexes in key
places, like is already done in many languages like Go, Rust
(multithreaded with inner mutability), Java, C, C++, etc**
- An issue that is still present to this day in amphp is the requirement
to switch to their own IO API in order to write async logic (admittedly
a lot better than PHP's stdlib): this is a **huge** issue especially for
PHP developers that have only ever used PHP's own stdlib (curl, etc), or
common libraries like guzzle.
For example: MadelineProto exposes MTProto events through an event
handler, which is a class with appropriately decorated user-defined
methods which handle chosen events concurrently (each event is handled a
new coroutine).
If users use native PHP functions within those methods, they block
execution of:
- All other events
- Most importantly, the library itself, which **requires** the
periodic execution of time-sensitive operations in order to maintain
connection with the MTProto servers (if MTProto updates aren't acked
within a specific time frame, they are either resent or the connection
is terminated by the server, plus `ping_delay_disconnect` must be
emitted periodically to signal liveliness of the connection itself)
To work around this, I run static analysis on code users write,
warning them if they use non-async stdlib functions and libraries in the
event handler, suggesting async alternatives (and also offering simple
to use synchronization primitives).
To this day, warnings emitted by the static analysis are the single
biggest hurdle to thousands of new users of my framework, which simply
do not understand why can't they just use `file_get_contents` or `curl`
instead of amphp/http-client, amphp/file, etc.
The same thing will happen to PHP, if a split approach is chosen (the
old stdlib remains blocking, and a new async stdlib is made)
To those who think making all of the PHP stdlib async will break
stuff, I say: **it is a misconception that making all PHP IO functions
async will break existing applications**.
Existing legacy frameworks like wordpress will simply keep working as
before, using existing application servers like php-fpm or apache.
New application servers based on `spawn` will only be able to use
modern, maintained frameworks which use proper synchronization
primitives (mutexes, channels, etc), and currently-legacy frameworks can
also be adapted with a little bit of effort, just like I adapted the
large codebase of MadelineProto over the course of a few months
(including both v2 and v3 migrations).
There are also ways to signal the async-safety of libraries: just
like in other languages (go, java, C, C++, etc):
- Via a note in the documentation (this class is safe to use
concurrently), an example from the go prometheus library:
https://pkg.go.dev/github.com/prometheus/client_golang/prometheus, `All
exported functions and methods are safe to be used concurrently unless
specified otherwise. `
- Via a class attribute like `Concurrent` (closer to Rust's `Sync`
attribute, though Rust's Sync [doesn't directly indicate
thread-safety](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=4ddb9fe86e126ee0e9fa76a006d648e2),
as race conditions on interior mutability are still possible on Sync
structs; in fact, Rust's Sync attribute is more of a **negative**
attribute, where its absence signal non-thread-safety, but its presence
does not guarantee thread safety, and non-thread-safe structs should be
marked `!Sync`, whereas a PHP `Concurrent` attribute could be a
**positive** attribute, explicitly guaranteeing thread safety)
Outside in PHP: in 2020, I started writing heavily async business logic
in Go, and was very positively surprised by the superior developer
experience, especially when using async: I was also very pleased to
learn that Go *encourages* (not forces) the use of the actor model
through channels.
In my professional Go experience, I wrote heavily concurrent and
massively parallel, high-load services, and golang's stackful, colorless
approach.
I also used Rust to write async business logic, and suffered from the
same issues I suffered with amphp v2: large amounts of boilerplate await
keywords, heavily impacting developer experience.
To this day, I consider Go's approach superior to that of any other
language, confirmed by extensive development experience.
Regards,
Daniil Gentili.