On Sat, Nov 23, 2024, at 7:11 AM, Rob Landers wrote:
> Hello internals,
>
> Born from the Records RFC (https://wiki.php.net/rfc/records)
> discussion, I would like to introduce to you a competing RFC: Data
> Classes (https://wiki.php.net/rfc/dataclass).
>
> This adds a new class modifier: data. This modifier drastically changes
> how classes work, making them comparable by value instead of reference,
> and any mutations behave more like arrays than objects (by vale). If
> desired, it can be combined with other modifiers, such as readonly, to
> enforce immutability.
>
> I've been playing with this feature for a few days now, and it is
> surprisingly intuitive to use. There is a (mostly) working
> implementation available on GitHub
> (https://github.com/php/php-src/pull/16904) if you want to have a go at
> it.
>
> Example:
>
> data class UserId { public function __construct(public int $id) {} }
>
> $user = new UserId(12);
> // later
> $admin = new UserId(12);
> if ($admin === $user) { // do something } // true
>
> Data classes are true value objects, with full copy-on-write optimizations:
>
> data class Point {
> public function __construct(public int $x, public int $y) {}
> public function add(Point $other): Point {
> // illustrating value semantics, no copy yet
> $previous = $this;
> // a copy happens on the next line
> $this->x = $this->x + $other->x;
> $this->y = $this->y + $other->y;
> assert($this !== $previous); // passes
> return $this;
> }
> }
>
> I think this would be an amazing addition to PHP.
>
> Sincerely,
>
> — Rob
Oh boy. Again, I think there's too much going on here, but I think that's
because different people are operating under a different definition of what
"value semantics" means. Let me try to break down what I think are the
constituent parts.
1. Pass-by-value. This is what arrays, ints, strings, etc. do. When you pass
a value to a function, what you get is logically a new value. It may be equal
to the old one, it may be the same memory location as the old one, but that's
hidden from you. Logically, it's a new value. (And if there's a shared memory
location, CoW hides that from you, too.) The intent here is to avoid "spooky
action at a distance" (SAAAD) (that is, changing a value inside a function is
guaranteed to not have any effect on the function that called it).
2. Logical equality. This only applies to compound values (arrays and
objects), but would imply checking equality by recursively checking equality on
sub-elements. (Properties in the case of objects, keys in the case of arrays.)
3. Physical equality. This is what === does, and checks that two variables
refer to the same memory location. Physical equality implies logical equality,
but not vice versa.
4. Immutability. A given variable's value cannot change.
5. Product types. A type that is based on two or more other types. (Eg, Point
is a product of int and int.)
These are all circling around the same problem space, but are all different
things. For instance, rigidly immutable values make pass-by-value irrelevant,
while pass-by-value avoids SAAD without needing immutability.
I think that's the key place where Rob's approach and Ilija's approach differ.
Rob's approach (records and dataclass) are trying to solve SAAAD through
immutability, one way or another. Ilija's approach is trying to solve SAAAD
through pass-by-value semantics.
By-value semantics would be really easy to implement by just auto-cloning an
object at a function boundary. However, that's also very wasteful, as the
object probably won't be modified, making the clone just a memory hog. The
issue is that detecting a modification on nested objects is not particularly
easy, which is how Ilija ended up with an explicit syntax to mark such
modification. (I personally dislike it, from a DX perspective, but I don't
have any suggestions on how to avoid it. If someone else does, please speak
up.)
Immutability semantics, as we've seen, seem easy but are actually quite
logically complex once you get past the bare minimum. (The bare minimum is
already provided by readonly classes. Problem solved.)
So I'm not sure we're all talking about solving the same problem, or solving it
in the same way.
Moreover, I don't think we all agree on the use cases we're solving. Let me
offer a few examples.
1. Fancy typed values
readonly class UserID {
public function __construct(public int $id) {}
}
This is already mostly supported, as above, just a bit verbose. In this case,
it makes sense that two equivalent objects are ==, and if we can make them ===
then that's a nice memory optimization, but not a requirement. In this case,
we're really just providing additional typing, and the immutability is trivial
(and already supported).
2. Product types (part 1)
class Point {
public function __construct(public int $x, public int $y) {}
}
Now here's the interesting part. Should Point be immutable? Should
modifications to Point inside a function affect values outside the function?
MAYBE! It depends on the context. In most cases, probably not. However,
consider a "registration/collection" use case of an event dispatcher:
class RegisterPluginsEvent {
public function __construct(public array $pluginsToRegister) {}
}
This is a "data" class in that it is carrying data, and is not a service.
However, we very clearly DO want SAAAD in this case. That's the whole reason
it exists. Currently this case is solved by conventional classes, so I don't
think there's anything to do here.
3. Product types (part 2)
Where it gets interesting is when you do need to modify an object, and
propagate those changes, but NOT propagate the ability to change it. Consider:
class Circle {
public function __construct(Point $center, int $radius) {}
}
$c = new Circle(new Point(1, 2), 5);
if ($some_user_data) {
$c->center->x = 10;
}
draw($c);
Here, *we do want the ability to modify $c after construction*. However, we do
NOT want to allow draw() to modify our $c. This case is currently unsolved in
PHP.
As above, there's two approaches to solving it: Making $c immutable generally,
or making a copy (immediately or delayed) when passing to draw(). Making $c
immutable generally would, in this case, be bad, because we do want the ability
to modify $c before passing it. It's just much more convenient than needing to
compute everything ahead of time and pass it to the constructor like it's just
a function.
4. Aggregate types
One of the main places that Ilija and I have discussed his structs proposal is
collections[1]. In many languages, collections have both an in-place modifier
and a clone-along-the-way modifier. For instance, sort() and sorted(),
reverse() and reversed(), etc. (Details vary a little by language.) Some
languages also have both mutable and immutable versions of each collection type
(Seq, Set, Map), with the in-place methods only available on the mutable
variant. There's also then methods to convert a mutable collection into an
immutable one and vice versa, which (I believe) implies making a copy. Kotlin
does both of the above, and is the model that I have been planning to pursue in
PHP, eventually.
Ilija has argued that if we can flag collection classes as pass-by-value, then
we don't need the immutable versions at all. The only reason for the immutable
versions to exist is to prevent SAAAD. If that's already prevented by the
passing semantics, then we don't need an explicitly immutable collection.
So that would mean:
$c = new List();
$c->add(1); // in place mutation.
$c->add(3); // in place mutation.
$c->add(2); // in place mutation.
function doStuff(List $l) {
$l->sort(); // in-place mutation of a value-passed value.
// do stuff with l.
}
doStuff($c);
var_dump($c); // Still ordered 1, 3, 2
So a sorted() method or an ImmutableList class wouldn't be necessary. (I can
see a use for sorted() anyway, to make it chainable, just like another recent
RFC proposed for the existing sort() function. That's related but a separate
question.)
This approach would not be possible if data/record/struct/whatever classes have
*any* built-in immutability to them. They just become super cumbersome to work
with. One way or another, you end up back at the withX() methods that we
already have and use.
$c = new List();
$c = $c->add(1);
$c = $c->add(3);
$c = $c->add(2);
// ...
Eew. I can do that already today, and I don't want to.
Here's the important observation: Speaking as the leading functional
programming in PHP fanboy, I don't really see much value at all to
intra-function immutability. It's just... not useful in PHP. Immutability at
function boundaries, that's super useful. But solving the problem at the
object-immutability level is the wrong place in PHP. (It is arguably the right
place in Haskell or ML, but PHP is not Haskell or ML.)
So IMO, the focus should be on just the function boundary semantics. The main
issue is how to make that work without wonky new syntax. Again, I don't have a
good answer, but would kindly request one. :-)
Finally, there's the question of equality. Be aware, PHP *already does value
equality for objects*:
https://3v4l.org/67ho1
The issue isn't that it's not there, it's that it cannot be controlled. I am
not convinced that overriding === to mean logical equality rather than physical
equality, but only for data objects, is wise. And we already have == handled.
(I use that fact in my PHPUnit tests all the time.) What is missing is the
ability to control how that == comparison is made.
class Rect {
private int $area;
public function __construct(public readonly int $h, public readonly int $w) {}
public function area(): int {
$this->area ??= $this->h * $this->w;
}
}
$r1 = new Rect(4, 5);
$r2 = new Rect(4, 5);
print $r1->area;
var_dump($r1 == $r2); // What happens here?
Presumably, we'd want those to be equal without having to compute $area on $r2.
Right now, that's impossible, and those objects would not be equal. Fixing
that has... nothing to do with value semantics at all. It has to do with
operator overloading, and I'm already on record that I am very in favor of
addressing that.
I hope that gives a better lay of the land for everyone in this thread.
--Larry Garfield
[1]
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/#collections