On Tue, Apr 2, 2024 at 2:20 AM Ilija Tovilo <tovilo.il...@gmail.com> wrote: > > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). > > In a nutshell, data classes are classes with value semantics. > Instances of data classes are implicitly copied when assigned to a > variable, or when passed to a function. When the new instance is > modified, the original instance remains untouched. This might sound > familiar: It's exactly how arrays work in PHP. > > ```php > $a = [1, 2, 3]; > $b = $a; > $b[] = 4; > var_dump($a); // [1, 2, 3] > var_dump($b); // [1, 2, 3, 4] > ``` > > You may think that copying the array on each assignment is expensive, > and you would be right. PHP uses a trick called copy-on-write, or CoW > for short. `$a` and `$b` actually share the same array until `$b[] = > 4;` modifies it. It's only at this point that the array is copied and > replaced in `$b`, so that the modification doesn't affect `$a`. As > long as a variable is the sole owner of a value, or none of the > variables modify the value, no copy is needed. Data classes use the > same mechanism. > > But why value semantics in the first place? There are two major flaws > with by-reference semantics for data structures: > > 1. It's very easy to forget cloning data that is referenced somewhere > else before modifying it. This will lead to "spooky actions at a > distance". Having recently used JavaScript (where all data structures > have by-reference semantics) for an educational IR optimizer, > accidental mutations of shared arrays/maps/sets were my primary source > of bugs. > 2. Defensive cloning (to avoid issue 1) will lead to useless work when > the value is not referenced anywhere else. > > PHP offers readonly properties and classes to address issue 1. > However, they further promote issue 2 by making it impossible to > modify values without cloning them first, even if we know they are not > referenced anywhere else. Some APIs further exacerbate the issue by > requiring multiple copies for multiple modifications (e.g. > `$response->withStatus(200)->withHeader('X-foo', 'foo');`). > > As you may have noticed, arrays already solve both of these issues > through CoW. Data classes allow implementing arbitrary data structures > with the same value semantics in core, extensions or userland. For > example, a `Vector` data class may look something like the following: > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values = $values; > } > > public mutating function append($value) { > $this->values[] = $value; > } > } > > $a = new Vector(1, 2, 3); > $b = $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > > An internal Vector implementation might offer a faster and stricter > alternative to arrays (e.g. Vector from php-ds). > > Some other things to note about data classes: > > * Data classes are ordinary classes, and as such may implement > interfaces, methods and more. I have not decided whether they should > support inheritance. > * Mutating method calls on data classes use a slightly different > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > marked as `mutating`. The reason for this is twofold: 1. It signals to > the caller that the value is modified. 2. It allows `$vector` to be > cloned before knowing whether the method `append` is modifying, which > hugely reduces implementation complexity in the engine. > * Data classes customize identity (`===`) comparison, in the same way > arrays do. Two data objects are identical if all their properties are > identical (including order for dynamic properties). > * Sharing data classes by-reference is possible using references, as > you would for arrays. > * We may decide to auto-implement `__toString` for data classes, > amongst other things. I am still undecided whether this is useful for > PHP. > * Data classes protect from interior mutability. More concretely, > mutating nested data objects stored in a `readonly` property is not > legal, whereas it would be if they were ordinary objects. > * In the future, it should be possible to allow using data classes in > `SplObjectStorage`. However, because hashing is complex, this will be > postponed to a separate RFC. > > One known gotcha is that we cannot trivially enforce placement of > `modfying` on methods without a performance hit. It is the > responsibility of the user to correctly mark such methods. > > Here's a fully functional PoC, excluding JIT: > https://github.com/php/php-src/pull/13800 > > Let me know what you think. I will start working on an RFC draft once > work on property hooks concludes. > > Ilija
Neat! I've been playing around with "value-like" objects for awhile now: https://github.com/withinboredom/time Having inheritance supported would be useful, for example, consider an ID type: data class Id { public function __construct(public string $id) {} } Maybe you want to extend it to a UserId: data class UserId extends Id {} Now you can't accidentally pass a VideoId as a UserId, but underlying ORMs can still use both as an Id. Robert Landers Software Engineer Utrecht NL