On Sat, Nov 23, 2024, at 7:11 AM, Rob Landers wrote:
> Hello internals,
>
> Born from the Records RFC (https://wiki.php.net/rfc/records) 
> discussion, I would like to introduce to you a competing RFC: Data 
> Classes (https://wiki.php.net/rfc/dataclass). 
>
> This adds a new class modifier: data. This modifier drastically changes 
> how classes work, making them comparable by value instead of reference, 
> and any mutations behave more like arrays than objects (by vale). If 
> desired, it can be combined with other modifiers, such as readonly, to 
> enforce immutability.
>
> I've been playing with this feature for a few days now, and it is 
> surprisingly intuitive to use. There is a (mostly) working 
> implementation available on GitHub 
> (https://github.com/php/php-src/pull/16904) if you want to have a go at 
> it.
>
> Example:
>
> data class UserId { public function __construct(public int $id) {} }
>
> $user = new UserId(12);
> // later
> $admin = new UserId(12);
> if ($admin === $user) { // do something } // true
>
> Data classes are true value objects, with full copy-on-write optimizations:
>
> data class Point {
>   public function __construct(public int $x, public int $y) {}
>   public function add(Point $other): Point {
>     // illustrating value semantics, no copy yet
>     $previous = $this;
>     // a copy happens on the next line
>     $this->x = $this->x + $other->x;
>     $this->y = $this->y + $other->y;
>     assert($this !== $previous); // passes
>     return $this;
>   }
> }
>
> I think this would be an amazing addition to PHP. 
>
> Sincerely,
>
> — Rob

Oh boy.  Again, I think there's too much going on here, but I think that's 
because different people are operating under a different definition of what 
"value semantics" means.  Let me try to break down what I think are the 
constituent parts.

1. Pass-by-value.  This is what arrays, ints, strings, etc. do.  When you pass 
a value to a function, what you get is logically a new value.  It may be equal 
to the old one, it may be the same memory location as the old one, but that's 
hidden from you.  Logically, it's a new value.  (And if there's a shared memory 
location, CoW hides that from you, too.)  The intent here is to avoid "spooky 
action at a distance" (SAAAD) (that is, changing a value inside a function is 
guaranteed to not have any effect on the function that called it).

2. Logical equality.  This only applies to compound values (arrays and 
objects), but would imply checking equality by recursively checking equality on 
sub-elements.  (Properties in the case of objects, keys in the case of arrays.)

3. Physical equality.  This is what === does, and checks that two variables 
refer to the same memory location.  Physical equality implies logical equality, 
but not vice versa.

4. Immutability.  A given variable's value cannot change.

5. Product types.  A type that is based on two or more other types.  (Eg, Point 
is a product of int and int.)

These are all circling around the same problem space, but are all different 
things.  For instance, rigidly immutable values make pass-by-value irrelevant, 
while pass-by-value avoids SAAD without needing immutability.

I think that's the key place where Rob's approach and Ilija's approach differ.  
Rob's approach (records and dataclass) are trying to solve SAAAD through 
immutability, one way or another.  Ilija's approach is trying to solve SAAAD 
through pass-by-value semantics.

By-value semantics would be really easy to implement by just auto-cloning an 
object at a function boundary.  However, that's also very wasteful, as the 
object probably won't be modified, making the clone just a memory hog.  The 
issue is that detecting a modification on nested objects is not particularly 
easy, which is how Ilija ended up with an explicit syntax to mark such 
modification.  (I personally dislike it, from a DX perspective, but I don't 
have any suggestions on how to avoid it.  If someone else does, please speak 
up.)

Immutability semantics, as we've seen, seem easy but are actually quite 
logically complex once you get past the bare minimum.  (The bare minimum is 
already provided by readonly classes.  Problem solved.)

So I'm not sure we're all talking about solving the same problem, or solving it 
in the same way.

Moreover, I don't think we all agree on the use cases we're solving.  Let me 
offer a few examples.

1. Fancy typed values

readonly class UserID {
  public function __construct(public int $id) {}
}

This is already mostly supported, as above, just a bit verbose.  In this case, 
it makes sense that two equivalent objects are ==, and if we can make them === 
then that's a nice memory optimization, but not a requirement.  In this case, 
we're really just providing additional typing, and the immutability is trivial 
(and already supported).

2. Product types (part 1)

class Point {
  public function __construct(public int $x, public int $y) {}
}

Now here's the interesting part.  Should Point be immutable?  Should 
modifications to Point inside a function affect values outside the function?  
MAYBE!  It depends on the context.  In most cases, probably not.  However, 
consider a "registration/collection" use case of an event dispatcher:

class RegisterPluginsEvent {
  public function __construct(public array $pluginsToRegister) {}
}

This is a "data" class in that it is carrying data, and is not a service.  
However, we very clearly DO want SAAAD in this case.  That's the whole reason 
it exists.  Currently this case is solved by conventional classes, so I don't 
think there's anything to do here.

3. Product types (part 2)

Where it gets interesting is when you do need to modify an object, and 
propagate those changes, but NOT propagate the ability to change it.  Consider:

class Circle {
  public function __construct(Point $center, int $radius) {}
}

$c = new Circle(new Point(1, 2), 5);
if ($some_user_data) {
  $c->center->x = 10;
}

draw($c);

Here, *we do want the ability to modify $c after construction*.  However, we do 
NOT want to allow draw() to modify our $c.  This case is currently unsolved in 
PHP.

As above, there's two approaches to solving it: Making $c immutable generally, 
or making a copy (immediately or delayed) when passing to draw().  Making $c 
immutable generally would, in this case, be bad, because we do want the ability 
to modify $c before passing it.  It's just much more convenient than needing to 
compute everything ahead of time and pass it to the constructor like it's just 
a function.

4. Aggregate types

One of the main places that Ilija and I have discussed his structs proposal is 
collections[1].  In many languages, collections have both an in-place modifier 
and a clone-along-the-way modifier.  For instance, sort() and sorted(), 
reverse() and reversed(), etc.  (Details vary a little by language.)  Some 
languages also have both mutable and immutable versions of each collection type 
(Seq, Set, Map), with the in-place methods only available on the mutable 
variant.  There's also then methods to convert a mutable collection into an 
immutable one and vice versa, which (I believe) implies making a copy.  Kotlin 
does both of the above, and is the model that I have been planning to pursue in 
PHP, eventually.

Ilija has argued that if we can flag collection classes as pass-by-value, then 
we don't need the immutable versions at all.  The only reason for the immutable 
versions to exist is to prevent SAAAD.  If that's already prevented by the 
passing semantics, then we don't need an explicitly immutable collection.

So that would mean:

$c = new List();
$c->add(1); // in place mutation.
$c->add(3); // in place mutation.
$c->add(2); // in place mutation.

function doStuff(List $l) {
  $l->sort(); // in-place mutation of a value-passed value.
  // do stuff with l.
}

doStuff($c);

var_dump($c); // Still ordered 1, 3, 2

So a sorted() method or an ImmutableList class wouldn't be necessary.  (I can 
see a use for sorted() anyway, to make it chainable, just like another recent 
RFC proposed for the existing sort() function.  That's related but a separate 
question.)

This approach would not be possible if data/record/struct/whatever classes have 
*any* built-in immutability to them.  They just become super cumbersome to work 
with.  One way or another, you end up back at the withX() methods that we 
already have and use.

$c = new List();
$c = $c->add(1);
$c = $c->add(3);
$c = $c->add(2);
// ...

Eew.  I can do that already today, and I don't want to.

Here's the important observation: Speaking as the leading functional 
programming in PHP fanboy, I don't really see much value at all to 
intra-function immutability.  It's just... not useful in PHP.  Immutability at 
function boundaries, that's super useful.  But solving the problem at the 
object-immutability level is the wrong place in PHP.  (It is arguably the right 
place in Haskell or ML, but PHP is not Haskell or ML.)

So IMO, the focus should be on just the function boundary semantics.  The main 
issue is how to make that work without wonky new syntax.  Again, I don't have a 
good answer, but would kindly request one. :-)

Finally, there's the question of equality.  Be aware, PHP *already does value 
equality for objects*:

https://3v4l.org/67ho1

The issue isn't that it's not there, it's that it cannot be controlled.  I am 
not convinced that overriding === to mean logical equality rather than physical 
equality, but only for data objects, is wise.  And we already have == handled.  
(I use that fact in my PHPUnit tests all the time.)  What is missing is the 
ability to control how that == comparison is made.

class Rect {

  private int $area;

  public function __construct(public readonly int $h, public readonly int $w) {}

  public function area(): int {
    $this->area ??= $this->h * $this->w;
  }
}

$r1 = new Rect(4, 5);
$r2 = new Rect(4, 5);
print $r1->area;
var_dump($r1 == $r2); // What happens here?

Presumably, we'd want those to be equal without having to compute $area on $r2. 
 Right now, that's impossible, and those objects would not be equal.  Fixing 
that has... nothing to do with value semantics at all.  It has to do with 
operator overloading, and I'm already on record that I am very in favor of 
addressing that.

I hope that gives a better lay of the land for everyone in this thread.

--Larry Garfield

[1] 
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/#collections

Reply via email to