Hi all.

Reposting here what I already posted (https://github.com/true-async/php-true-async-rfc/discussions/8#discussioncomment-15074303) in the discussion on php-true-async-rfc (the discussion should really move there from this list IMO).


Senior software artisan here with 15 years of PHP experience, 8 years of async PHP experience (amphp v2, then v3), 5 years of go experience, 3 years of Rust experience, and much more.

My recommendation for async PHP is:

- Stackful coroutines
- Colorless functions
- Existing IO functions should be async
- Some memory isolation: shared statics maybe with a separate attribute (i.e. allow fiber-local statics which have uses, but also allow normal statics as is already the case to allow for caching); while getting rid of globals (not statics) in general would be nice, I feel like it would be out of scope for this RFC.   Race conditions are **not** an issue, like they are not an issue in the vast majority of languages, as there is a variety of tools and models to work with them: channels (go), mutexes (go, amphp, rust, C, C++, etc), actor model (any language), etc.

This model matches the status quo in golang and amphp v3: from my 8 years of experience in writing async both **business logic** and **abstraction logic** in multiple languages, this is the best concurrency model.

Some evidence:

I maintain [MadelineProto](https://github.com/danog/MadelineProto/), the biggest PHP MTProto client.

MTProto is an async binary protocol over TCP (no HTTP involved), requires heavy caching in order for clients to function correctly, and a lot race-heavy abstractions.

MadelineProto is a framework which contains a fully async MTProto client.

I migrated MadelineProto to async PHP with amphp v2 (colored, stackless async) in 2018, and then to amphp v3 (colorless, stackful async) in 2022.

- The biggest problem while using v2 was the colored approach, requiring the use of await (yield, using await for clarity from here on) every time a function may become async: this is the status quo of some languages like JS, but in reality **it becomes a giant pain when writing a lot of business logic**: not only that, there were constant **breaking changes** every time a method that previously wasn't async (i.e. had no network/IO logic) suddenly becomes async (thus requiring the use of await).

  In MadelineProto, I worked around this with a custom coroutine runtime (replacing that of amphp v2), which allowed users to `await` even non-async functions: by then forcing users to always await all methods of the framework, I managed to avoid breaking changes every time an abstraction became async.

  Clearly, this was a crutch, made to workaround a big usability issue for end users, caused by colored functions.

  Another issue was with the stackless approach, which essentially forced to use of `call` to spawn a new coroutine (creating a new generator) every time when invoking an async function: I again worked around this in my custom coroutine runtime, allowing to spawn a new coroutine simply by using `await` (`yield`).

  This in turn led to the creation of a new generator object every time `await` was used: I was also able to work around this, transforming every `yield` to a `yield from`: this reduced the overhead, but it was clearly yet another crutch, made to transform a stackless approach into a more usable stackful approach.

- The switch to amphp v3 finally got rid of the colored approach, switching to a stackful, colorless approach.

  This was a huge improvement in terms of developer UX: no more worrying about when to use `await`, no more custom coroutine runtimes to implement stackfulness to avoid using `call()` *and* `await` every time when calling an async function, all **without impacting safety**.

  Amphp v3's model is pretty close to golang's model, and mirrors go's ease of use.


- When migrating the MadelineProto framework to amphp, right away, it was clear to me that with concurrency, race conditions would be an issue.  Amphp v2 already offered synchronization primitives in the form of mutexes, which are more than enough to guarantee safety.

   However, I wanted an easier approach for me as the library developer, which is why I, right away, chose to communicate instead of sharing memory, by adopting the [actor model](https://en.wikipedia.org/wiki/Actor_model) for MadelineProto's internal core modules (IO, update handling, any moderately large logic requiring synchronization).

   Mutexes are still used for isolated, smaller logic, as they provide equivalent safety with fewer boilerplate compared to channels/actors.


   **A shared-nothing approach only adds needless complexity and overhead, when the alternative is simply to isolate logic needing synchronization in a separate actor, or adding a few mutexes in key places, like is already done in many languages like Go, Rust (multithreaded with inner mutability), Java, C, C++, etc**

- An issue that is still present to this day in amphp is the requirement to switch to their own IO API in order to write async logic (admittedly a lot better than PHP's stdlib): this is a **huge** issue especially for PHP developers that have only ever used PHP's own stdlib (curl, etc), or common libraries like guzzle.

  For example: MadelineProto exposes MTProto events through an event handler, which is a class with appropriately decorated user-defined methods which handle chosen events concurrently (each event is handled a new coroutine).

  If users use native PHP functions within those methods, they block execution of:
  - All other events
  - Most importantly, the library itself, which **requires** the periodic execution of time-sensitive operations in order to maintain connection with the MTProto servers (if MTProto updates aren't acked within a specific time frame, they are either resent or the connection is terminated by the server, plus `ping_delay_disconnect` must be emitted periodically to signal liveliness of the connection itself)

  To work around this, I run static analysis on code users write, warning them if they use non-async stdlib functions and libraries in the event handler, suggesting async alternatives (and also offering simple to use synchronization primitives).

  To this day, warnings emitted by the static analysis are the single biggest hurdle to thousands of new users of my framework, which simply do not understand why can't they just use `file_get_contents` or `curl` instead of amphp/http-client, amphp/file, etc.   The same thing will happen to PHP, if a split approach is chosen (the old stdlib remains blocking, and a new async stdlib is made)

  To those who think making all of the PHP stdlib async will break stuff, I say: **it is a misconception that making all PHP IO functions async will break existing applications**.   Existing legacy frameworks like wordpress will simply keep working as before, using existing application servers like php-fpm or apache.

  New application servers based on `spawn` will only be able to use modern, maintained frameworks which use proper synchronization primitives (mutexes, channels, etc), and currently-legacy frameworks can also be adapted with a little bit of effort, just like I adapted the large codebase of MadelineProto over the course of a few months (including both v2 and v3 migrations).

  There are also ways to signal the async-safety of libraries: just like in other languages (go, java, C, C++, etc):

  - Via a note in the documentation (this class is safe to use concurrently), an example from the go prometheus library: https://pkg.go.dev/github.com/prometheus/client_golang/prometheus, `All exported functions and methods are safe to be used concurrently unless specified otherwise. `   - Via a class attribute like `Concurrent` (closer to Rust's `Sync` attribute, though Rust's Sync [doesn't directly indicate thread-safety](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=4ddb9fe86e126ee0e9fa76a006d648e2), as race conditions on interior mutability are still possible on Sync structs; in fact, Rust's Sync attribute is more of a **negative** attribute, where its absence signal non-thread-safety, but its presence does not guarantee thread safety, and non-thread-safe structs should be marked `!Sync`, whereas a PHP `Concurrent` attribute could be a **positive** attribute, explicitly guaranteeing thread safety)

Outside in PHP: in 2020, I started writing heavily async business logic in Go, and was very positively surprised by the superior developer experience, especially when using async: I was also very pleased to learn that Go *encourages* (not forces) the use of the actor model through channels.

In my professional Go experience, I wrote heavily concurrent and massively parallel, high-load services, and golang's stackful, colorless approach.

I also used Rust to write async business logic, and suffered from the same issues I suffered with amphp v2: large amounts of boilerplate await keywords, heavily impacting developer experience.

To this day, I consider Go's approach superior to that of any other language, confirmed by extensive development experience.


Regards,

Daniil Gentili.

Reply via email to