Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

Ilija Tovilo Wed, 03 Apr 2024 11:11:14 -0700

Hi Larry

On Wed, Apr 3, 2024 at 12:03 AM Larry Garfield <la...@garfieldtech.com> wrote:
>
> On Tue, Apr 2, 2024, at 6:04 PM, Ilija Tovilo wrote:
>
> > I think you misunderstood. The intention is to mark both call-site and
> > declaration. Call-site is marked with ->method!(), while declaration
> > is marked with "public mutating function". Call-site is required to
> > avoid the engine complexity, as previously mentioned. But
> > declaration-site is required so that the user (and IDEs) even know
> > that you need to use the special syntax at the call-site.
>
> Ah, OK.  That's... unfortunate, but I defer to you on the implementation 
> complexity.


As I've argued, I believe the different syntax is a positive. This
way, data classes are known to stay unmodified unless:

1. You're explicitly modifying it yourself.
2. You're calling a mutating method, with its associated syntax.
3. You're creating a reference from the value, either explicitly or by
passing it to a by-reference parameter.

By-reference argument passing is the only way that mutations of data
classes can be hidden (given that they look exactly like normal
by-value arguments), and its arguably a flaw of by-reference passing
itself. In all other cases, you can expect your value _not_ to
unexpectedly change. For this reason, I consider it as an alternative
approach to readonly classes.

> > Disallowing ordinary by-ref objects is not trivial without additional
> > performance penalties, and I don't see a good reason for it. Can you
> > provide an example on when that would be problematic?
>
> There's two aspects to it, that I see.
>
> data class A {
>   public function __construct(public string $name) {}
> }
>
> data class B {
>   public function __construct(
>     public A $a,
>     public PDO $conn,
>   ) {}
> }
>
> $b = new B(new A(), $pdoConnection);
>
> function stuff(B $b2) {
>   $b2->a->name = 'Larry';
>   // This triggers a CoW on $b2, separating it from $b, and also creating a 
> new instance of A.  What about $conn?
>   // Does it get cloned?  That would be bad.  Does it not get cloned?  That 
> seems weird that it's still the same on
>   // a data object.
>
>   $b2->conn->beginTransaction();
>   // This I would say is technically a modification, since the state of the 
> connection is changing.  But then
>   // should this trigger $b2 cloning from $b1?  Neither answer is obvious to 
> me.
> }

IMO, the answer is relatively straight-forward: PDO is a reference
type. For all intents and purposes, when you're passing B to stuff(),
B is copied. Since B::$conn is a "reference" (read pointer), copying B
doesn't copy the connection, only the reference to it. B::$a, however,
is a value type, so copying B also copies A. The fact that this isn't
_exactly_ what happens under the hood due to CoW is an implementation
detail, it doesn't need to change how you think about it. From the
users standpoint, $b and $b2 can already separate values once stuff()
is called.

This is really no different from arrays:

```php
$b = ['a' => ['name' => 'Larry'], 'conn' => $pdoConnection];
$b2 = $b; // $b is detached from $b2, $b['conn'] remains a shared object.
```

> The other aspect is, eg, serialization.  People will come to expect 
> (reasonably) that a data class will have certain properties (in the abstract 
> sense, not lexical sense).  For instance, most classes are serializable, but 
> a few are not.  (Eg, if they have a reference to PDO or a file handle or 
> something unserializable.)  Data classes seem like they should be safe to 
> serialize always, as they're "just data".  If data classes are limited to 
> primitives and data classes internally, that means we can effectively 
> guarantee that they will be serializable, always.  If one of the properties 
> could be a non-serializable object, that assumption breaks.

I'm not sure that's a convincing argument to fully disallow reference
types, especially since it would prevent you from storing
DateTimeImmutables and other immutable values in data classes and thus
break many valid use-cases. That would arguably be very limiting.

> There's probably other similar examples besides serialization where "think of 
> this as data" and "think of this as logic" is how you'd want to think, which 
> leads to different assumptions, which we shouldn't stealthily break.

I think your assumption here is that non-data classes cannot contain
data. This doesn't hold, and especially will not until data classes
become more common. Readonly classes can be considered strict versions
of data classes in terms of mutability, minus some of the other
semantic changes (e.g. identity).

Ilija

Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs)

Reply via email to