On Sat, Nov 23, 2024, at 7:11 AM, Rob Landers wrote: > Hello internals, > > Born from the Records RFC (https://wiki.php.net/rfc/records) > discussion, I would like to introduce to you a competing RFC: Data > Classes (https://wiki.php.net/rfc/dataclass). > > This adds a new class modifier: data. This modifier drastically changes > how classes work, making them comparable by value instead of reference, > and any mutations behave more like arrays than objects (by vale). If > desired, it can be combined with other modifiers, such as readonly, to > enforce immutability. > > I've been playing with this feature for a few days now, and it is > surprisingly intuitive to use. There is a (mostly) working > implementation available on GitHub > (https://github.com/php/php-src/pull/16904) if you want to have a go at > it. > > Example: > > data class UserId { public function __construct(public int $id) {} } > > $user = new UserId(12); > // later > $admin = new UserId(12); > if ($admin === $user) { // do something } // true > > Data classes are true value objects, with full copy-on-write optimizations: > > data class Point { > public function __construct(public int $x, public int $y) {} > public function add(Point $other): Point { > // illustrating value semantics, no copy yet > $previous = $this; > // a copy happens on the next line > $this->x = $this->x + $other->x; > $this->y = $this->y + $other->y; > assert($this !== $previous); // passes > return $this; > } > } > > I think this would be an amazing addition to PHP. > > Sincerely, > > — Rob
Oh boy. Again, I think there's too much going on here, but I think that's because different people are operating under a different definition of what "value semantics" means. Let me try to break down what I think are the constituent parts. 1. Pass-by-value. This is what arrays, ints, strings, etc. do. When you pass a value to a function, what you get is logically a new value. It may be equal to the old one, it may be the same memory location as the old one, but that's hidden from you. Logically, it's a new value. (And if there's a shared memory location, CoW hides that from you, too.) The intent here is to avoid "spooky action at a distance" (SAAAD) (that is, changing a value inside a function is guaranteed to not have any effect on the function that called it). 2. Logical equality. This only applies to compound values (arrays and objects), but would imply checking equality by recursively checking equality on sub-elements. (Properties in the case of objects, keys in the case of arrays.) 3. Physical equality. This is what === does, and checks that two variables refer to the same memory location. Physical equality implies logical equality, but not vice versa. 4. Immutability. A given variable's value cannot change. 5. Product types. A type that is based on two or more other types. (Eg, Point is a product of int and int.) These are all circling around the same problem space, but are all different things. For instance, rigidly immutable values make pass-by-value irrelevant, while pass-by-value avoids SAAD without needing immutability. I think that's the key place where Rob's approach and Ilija's approach differ. Rob's approach (records and dataclass) are trying to solve SAAAD through immutability, one way or another. Ilija's approach is trying to solve SAAAD through pass-by-value semantics. By-value semantics would be really easy to implement by just auto-cloning an object at a function boundary. However, that's also very wasteful, as the object probably won't be modified, making the clone just a memory hog. The issue is that detecting a modification on nested objects is not particularly easy, which is how Ilija ended up with an explicit syntax to mark such modification. (I personally dislike it, from a DX perspective, but I don't have any suggestions on how to avoid it. If someone else does, please speak up.) Immutability semantics, as we've seen, seem easy but are actually quite logically complex once you get past the bare minimum. (The bare minimum is already provided by readonly classes. Problem solved.) So I'm not sure we're all talking about solving the same problem, or solving it in the same way. Moreover, I don't think we all agree on the use cases we're solving. Let me offer a few examples. 1. Fancy typed values readonly class UserID { public function __construct(public int $id) {} } This is already mostly supported, as above, just a bit verbose. In this case, it makes sense that two equivalent objects are ==, and if we can make them === then that's a nice memory optimization, but not a requirement. In this case, we're really just providing additional typing, and the immutability is trivial (and already supported). 2. Product types (part 1) class Point { public function __construct(public int $x, public int $y) {} } Now here's the interesting part. Should Point be immutable? Should modifications to Point inside a function affect values outside the function? MAYBE! It depends on the context. In most cases, probably not. However, consider a "registration/collection" use case of an event dispatcher: class RegisterPluginsEvent { public function __construct(public array $pluginsToRegister) {} } This is a "data" class in that it is carrying data, and is not a service. However, we very clearly DO want SAAAD in this case. That's the whole reason it exists. Currently this case is solved by conventional classes, so I don't think there's anything to do here. 3. Product types (part 2) Where it gets interesting is when you do need to modify an object, and propagate those changes, but NOT propagate the ability to change it. Consider: class Circle { public function __construct(Point $center, int $radius) {} } $c = new Circle(new Point(1, 2), 5); if ($some_user_data) { $c->center->x = 10; } draw($c); Here, *we do want the ability to modify $c after construction*. However, we do NOT want to allow draw() to modify our $c. This case is currently unsolved in PHP. As above, there's two approaches to solving it: Making $c immutable generally, or making a copy (immediately or delayed) when passing to draw(). Making $c immutable generally would, in this case, be bad, because we do want the ability to modify $c before passing it. It's just much more convenient than needing to compute everything ahead of time and pass it to the constructor like it's just a function. 4. Aggregate types One of the main places that Ilija and I have discussed his structs proposal is collections[1]. In many languages, collections have both an in-place modifier and a clone-along-the-way modifier. For instance, sort() and sorted(), reverse() and reversed(), etc. (Details vary a little by language.) Some languages also have both mutable and immutable versions of each collection type (Seq, Set, Map), with the in-place methods only available on the mutable variant. There's also then methods to convert a mutable collection into an immutable one and vice versa, which (I believe) implies making a copy. Kotlin does both of the above, and is the model that I have been planning to pursue in PHP, eventually. Ilija has argued that if we can flag collection classes as pass-by-value, then we don't need the immutable versions at all. The only reason for the immutable versions to exist is to prevent SAAAD. If that's already prevented by the passing semantics, then we don't need an explicitly immutable collection. So that would mean: $c = new List(); $c->add(1); // in place mutation. $c->add(3); // in place mutation. $c->add(2); // in place mutation. function doStuff(List $l) { $l->sort(); // in-place mutation of a value-passed value. // do stuff with l. } doStuff($c); var_dump($c); // Still ordered 1, 3, 2 So a sorted() method or an ImmutableList class wouldn't be necessary. (I can see a use for sorted() anyway, to make it chainable, just like another recent RFC proposed for the existing sort() function. That's related but a separate question.) This approach would not be possible if data/record/struct/whatever classes have *any* built-in immutability to them. They just become super cumbersome to work with. One way or another, you end up back at the withX() methods that we already have and use. $c = new List(); $c = $c->add(1); $c = $c->add(3); $c = $c->add(2); // ... Eew. I can do that already today, and I don't want to. Here's the important observation: Speaking as the leading functional programming in PHP fanboy, I don't really see much value at all to intra-function immutability. It's just... not useful in PHP. Immutability at function boundaries, that's super useful. But solving the problem at the object-immutability level is the wrong place in PHP. (It is arguably the right place in Haskell or ML, but PHP is not Haskell or ML.) So IMO, the focus should be on just the function boundary semantics. The main issue is how to make that work without wonky new syntax. Again, I don't have a good answer, but would kindly request one. :-) Finally, there's the question of equality. Be aware, PHP *already does value equality for objects*: https://3v4l.org/67ho1 The issue isn't that it's not there, it's that it cannot be controlled. I am not convinced that overriding === to mean logical equality rather than physical equality, but only for data objects, is wise. And we already have == handled. (I use that fact in my PHPUnit tests all the time.) What is missing is the ability to control how that == comparison is made. class Rect { private int $area; public function __construct(public readonly int $h, public readonly int $w) {} public function area(): int { $this->area ??= $this->h * $this->w; } } $r1 = new Rect(4, 5); $r2 = new Rect(4, 5); print $r1->area; var_dump($r1 == $r2); // What happens here? Presumably, we'd want those to be equal without having to compute $area on $r2. Right now, that's impossible, and those objects would not be equal. Fixing that has... nothing to do with value semantics at all. It has to do with operator overloading, and I'm already on record that I am very in favor of addressing that. I hope that gives a better lay of the land for everyone in this thread. --Larry Garfield [1] https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/#collections