On Tue, Apr 2, 2024, at 6:04 PM, Ilija Tovilo wrote:

>> What would be the reason not to?  As you indicated in another reply, the 
>> main reason some languages don't is to avoid large stack copies, but PHP 
>> doesn't have large stack copies for objects anyway so that's a non-issue.
>>
>> I've long argued that the fewer differences there are between service 
>> classes and data classes, the better, so I'm not sure what advantage this 
>> would have other than "ugh, inheritance is such a mess" (which is true, but 
>> that ship sailed long ago).
>
> One issue that just came to mind is object identity. For example:
>
> class Person {
>     public function __construct(
>         public string $firstname,
>         public string $lastname,
>     ) {}
> }
>
> class Manager extends Person {
>     public function bossAround() {}
> }
>
> $person = new Person('Boss', 'Man');
> $manager = new Manager('Boss', 'Man');
> var_dump($person === $manager); // ???
>
> Equality for data objects is based on data, rather than the object
> handle. How does this interact with inheritance? Technically, Person
> and Manager represent the same data. Manager contains additional
> behavior, but does that change identity?
>
> I'm not sure what the answer is. That's just the first thing that came
> to mind. I'm confident we'll discover more such edge cases. Of course,
> I can invest the time to find the questions before deciding to
> disallow inheritance.

As Bruce already demonstrated, equality should include type, not just 
properties.  Even without inheritance that is necessary.

There may be good reason to omit inheritance, as we did on enums, but that 
shouldn't be the starting point.  (I'd have to research and see what other 
languages do. I think it's a mixed bag.)  We should try to ferret out those 
edge cases and see if there's reasonable solutions to them.

>> > * Mutating method calls on data classes use a slightly different
>> > syntax: `$vector->append!(42)`. All methods mutating `$this` must be
>> > marked as `mutating`. The reason for this is twofold: 1. It signals to
>> > the caller that the value is modified. 2. It allows `$vector` to be
>> > cloned before knowing whether the method `append` is modifying, which
>> > hugely reduces implementation complexity in the engine.
>>
>> As discussed in R11, it would be very beneficial if this marker could be on 
>> the method definition, not the method invocation.  You indicated that would 
>> be Hard(tm), but I think it's worth some effort to see if it's surmountably 
>> hard.  (Or at least less hard than just auto-detecting it, which you 
>> indicated is Extremely Hard(tm).)
>
> I think you misunderstood. The intention is to mark both call-site and
> declaration. Call-site is marked with ->method!(), while declaration
> is marked with "public mutating function". Call-site is required to
> avoid the engine complexity, as previously mentioned. But
> declaration-site is required so that the user (and IDEs) even know
> that you need to use the special syntax at the call-site.

Ah, OK.  That's... unfortunate, but I defer to you on the implementation 
complexity.

>> So to the extent there is a consensus, equality, stringifying, and a 
>> hashcode (which we don't have yet, but will need in the future for some 
>> things I suspect) seem to be the rough expected defaults.
>
> I'm just skeptical whether the default __toString() is ever useful. I
> can see an argument for it for quick debugging in languages that don't
> provide something like var_dump(). In PHP this seems much less useful.
> It's impossible to provide a default implementation that works
> everywhere (or pretty much anywhere, even).
>
> Equality is already included. Hashing should be added separately, and
> probably not just to data classes.

The equivalent of Python's __repr__ (which it auto-generates) would be 
__debugInfo().  Arguably its current output is what the default would likely be 
anyway, though.  I believe the typical auto-toString output is the same data, 
but presented in a more human-friendly way.  (So yes, mainly useful for 
debugging.)

Equality, well, we've already debated whether or not we should make that a 
general feature. :-)  Of note, though, in languages with equals(), it's also 
user-overridable.

>> > * In the future, it should be possible to allow using data classes in
>> > `SplObjectStorage`. However, because hashing is complex, this will be
>> > postponed to a separate RFC.

I believe this is where we would want/need a __hash() method or similar; Derick 
and I encountered that while researching collections in other languages.  
Leaving it out for now is fine, but it would be important for any future 
list-of functionality.

>> Would data class properties only be allowed to be other data classes, or 
>> could they hold a non-data class?  My knee jerk response is they should be 
>> data classes all the way down; the only counter-argument I can think of it 
>> would be how much existing code is out there that is a "data class" in all 
>> but name.  I still fear someone adding a DB connection object to a data 
>> class and everything going to hell, though. :-)
>
> Disallowing ordinary by-ref objects is not trivial without additional
> performance penalties, and I don't see a good reason for it. Can you
> provide an example on when that would be problematic?
>
> Ilija

There's two aspects to it, that I see.

data class A {
  public function __construct(public string $name) {}
}

data class B {
  public function __construct(
    public A $a,
    public PDO $conn,
  ) {}
}

$b = new B(new A(), $pdoConnection);

function stuff(B $b2) {
  $b2->a->name = 'Larry';
  // This triggers a CoW on $b2, separating it from $b, and also creating a new 
instance of A.  What about $conn?
  // Does it get cloned?  That would be bad.  Does it not get cloned?  That 
seems weird that it's still the same on
  // a data object.

  $b2->conn->beginTransaction();
  // This I would say is technically a modification, since the state of the 
connection is changing.  But then 
  // should this trigger $b2 cloning from $b1?  Neither answer is obvious to me.
}

In a sense, it's similar to the "PSR-7 is immutable, asterisk, streams" issue 
that has often been pointed out.  "Data objects are safe to pass around and 
will self-clone when needed, asterisk, unless there's a normal object in it and 
then it's non-obvious" doesn't sound like a good mental model to give people.

Or consider DateTime.  It's mutable.  Should mutating it clone an object that 
has a DateTime property?  I can realistically argue both ways, and I'm not 
convinced either is right; just that neither is intuitive.

"Data classes all the way down" resolves this problem.

The caveat would be that a genuinely immutable object would (probably?) be safe 
(DateTimeImmutable, or a readonly class), so maybe we can make readonly classes 
an exception?  Ah, no, we cannot, because despite what PHPStan insists, there's 
no reason that the single write to a readonly property must happen at 
construction.  It can easily happen as a side effect of another method (eg, a 
cache value), meaning readonly objects are not truly immutable.  In fact, 
readonly objects can have non-readonly objects on their properties, too.  So I 
don't think that's safe, either.

The other aspect is, eg, serialization.  People will come to expect 
(reasonably) that a data class will have certain properties (in the abstract 
sense, not lexical sense).  For instance, most classes are serializable, but a 
few are not.  (Eg, if they have a reference to PDO or a file handle or 
something unserializable.)  Data classes seem like they should be safe to 
serialize always, as they're "just data".  If data classes are limited to 
primitives and data classes internally, that means we can effectively guarantee 
that they will be serializable, always.  If one of the properties could be a 
non-serializable object, that assumption breaks.

There's probably other similar examples besides serialization where "think of 
this as data" and "think of this as logic" is how you'd want to think, which 
leads to different assumptions, which we shouldn't stealthily break.

--Larry Garfield

Reply via email to