On Mon, Apr 1, 2024 at 9:20 PM Ilija Tovilo <tovilo.il...@gmail.com> wrote:

> Hi everyone!
>
> I'd like to introduce an idea I've played around with for a couple of
> weeks: Data classes, sometimes called structs in other languages (e.g.
> Swift and C#).
>
> In a nutshell, data classes are classes with value semantics.
> Instances of data classes are implicitly copied when assigned to a
> variable, or when passed to a function. When the new instance is
> modified, the original instance remains untouched. This might sound
> familiar: It's exactly how arrays work in PHP.
>
> ```php
> $a = [1, 2, 3];
> $b = $a;
> $b[] = 4;
> var_dump($a); // [1, 2, 3]
> var_dump($b); // [1, 2, 3, 4]
> ```
>
> You may think that copying the array on each assignment is expensive,
> and you would be right. PHP uses a trick called copy-on-write, or CoW
> for short. `$a` and `$b` actually share the same array until `$b[] =
> 4;` modifies it. It's only at this point that the array is copied and
> replaced in `$b`, so that the modification doesn't affect `$a`. As
> long as a variable is the sole owner of a value, or none of the
> variables modify the value, no copy is needed. Data classes use the
> same mechanism.
>
> But why value semantics in the first place? There are two major flaws
> with by-reference semantics for data structures:
>
> 1. It's very easy to forget cloning data that is referenced somewhere
> else before modifying it. This will lead to "spooky actions at a
> distance". Having recently used JavaScript (where all data structures
> have by-reference semantics) for an educational IR optimizer,
> accidental mutations of shared arrays/maps/sets were my primary source
> of bugs.
> 2. Defensive cloning (to avoid issue 1) will lead to useless work when
> the value is not referenced anywhere else.
>
> PHP offers readonly properties and classes to address issue 1.
> However, they further promote issue 2 by making it impossible to
> modify values without cloning them first, even if we know they are not
> referenced anywhere else. Some APIs further exacerbate the issue by
> requiring multiple copies for multiple modifications (e.g.
> `$response->withStatus(200)->withHeader('X-foo', 'foo');`).
>
> As you may have noticed, arrays already solve both of these issues
> through CoW. Data classes allow implementing arbitrary data structures
> with the same value semantics in core, extensions or userland. For
> example, a `Vector` data class may look something like the following:
>
> ```php
> data class Vector {
>     private $values;
>
>     public function __construct(...$values) {
>         $this->values = $values;
>     }
>
>     public mutating function append($value) {
>         $this->values[] = $value;
>     }
> }
>
> $a = new Vector(1, 2, 3);
> $b = $a;
> $b->append!(4);
> var_dump($a); // Vector(1, 2, 3)
> var_dump($b); // Vector(1, 2, 3, 4)
> ```
>
> An internal Vector implementation might offer a faster and stricter
> alternative to arrays (e.g. Vector from php-ds).
>
>
Exciting times to be a PHP Developer!


> Some other things to note about data classes:
>
> * Data classes are ordinary classes, and as such may implement
> interfaces, methods and more. I have not decided whether they should
> support inheritance.
>

I'd argue in favor of not including inheritance in the first version.
Taking inheritance out is an impossible BC Break. Not introducing it in the
first stable release gives users a chance to evaluate whether it's
something we will drastically miss.


> * Mutating method calls on data classes use a slightly different
> syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> marked as `mutating`. The reason for this is twofold: 1. It signals to
> the caller that the value is modified. 2. It allows `$vector` to be
> cloned before knowing whether the method `append` is modifying, which
> hugely reduces implementation complexity in the engine.
>

I'm not sure if I understood this one. Do you mean that the `!` modifier
here (at call-site) is helping the engine clone the variable before even
diving into whether `append()` has been tagged as mutating? From outside it
looks odd that a clone would happen ahead-of-time while talking about
copy-on-write. Would this syntax break for non-mutating methods?


> * Data classes customize identity (`===`) comparison, in the same way
> arrays do. Two data objects are identical if all their properties are
> identical (including order for dynamic properties).
> * Sharing data classes by-reference is possible using references, as
> you would for arrays.
> * We may decide to auto-implement `__toString` for data classes,
> amongst other things. I am still undecided whether this is useful for
> PHP.
> * Data classes protect from interior mutability. More concretely,
> mutating nested data objects stored in a `readonly` property is not
> legal, whereas it would be if they were ordinary objects.
> * In the future, it should be possible to allow using data classes in
> `SplObjectStorage`. However, because hashing is complex, this will be
> postponed to a separate RFC.
>
> One known gotcha is that we cannot trivially enforce placement of
> `modfying` on methods without a performance hit. It is the
> responsibility of the user to correctly mark such methods.
>
> Here's a fully functional PoC, excluding JIT:
> https://github.com/php/php-src/pull/13800
>
> Let me know what you think. I will start working on an RFC draft once
> work on property hooks concludes.
>
> Ilija
>

Looking forward to this!!!

-- 
Marco Deleu

Reply via email to