Hi Larry

On Tue, Apr 2, 2024 at 5:31 PM Larry Garfield <la...@garfieldtech.com> wrote:
>
> On Tue, Apr 2, 2024, at 12:17 AM, Ilija Tovilo wrote:
> > Hi everyone!
> >
> > I'd like to introduce an idea I've played around with for a couple of
> > weeks: Data classes, sometimes called structs in other languages (e.g.
> > Swift and C#).
> >
> > * Data classes are ordinary classes, and as such may implement
> > interfaces, methods and more. I have not decided whether they should
> > support inheritance.
>
> What would be the reason not to?  As you indicated in another reply, the main 
> reason some languages don't is to avoid large stack copies, but PHP doesn't 
> have large stack copies for objects anyway so that's a non-issue.
>
> I've long argued that the fewer differences there are between service classes 
> and data classes, the better, so I'm not sure what advantage this would have 
> other than "ugh, inheritance is such a mess" (which is true, but that ship 
> sailed long ago).

One issue that just came to mind is object identity. For example:

class Person {
    public function __construct(
        public string $firstname,
        public string $lastname,
    ) {}
}

class Manager extends Person {
    public function bossAround() {}
}

$person = new Person('Boss', 'Man');
$manager = new Manager('Boss', 'Man');
var_dump($person === $manager); // ???

Equality for data objects is based on data, rather than the object
handle. How does this interact with inheritance? Technically, Person
and Manager represent the same data. Manager contains additional
behavior, but does that change identity?

I'm not sure what the answer is. That's just the first thing that came
to mind. I'm confident we'll discover more such edge cases. Of course,
I can invest the time to find the questions before deciding to
disallow inheritance.

> > * Mutating method calls on data classes use a slightly different
> > syntax: `$vector->append!(42)`. All methods mutating `$this` must be
> > marked as `mutating`. The reason for this is twofold: 1. It signals to
> > the caller that the value is modified. 2. It allows `$vector` to be
> > cloned before knowing whether the method `append` is modifying, which
> > hugely reduces implementation complexity in the engine.
>
> As discussed in R11, it would be very beneficial if this marker could be on 
> the method definition, not the method invocation.  You indicated that would 
> be Hard(tm), but I think it's worth some effort to see if it's surmountably 
> hard.  (Or at least less hard than just auto-detecting it, which you 
> indicated is Extremely Hard(tm).)

I think you misunderstood. The intention is to mark both call-site and
declaration. Call-site is marked with ->method!(), while declaration
is marked with "public mutating function". Call-site is required to
avoid the engine complexity, as previously mentioned. But
declaration-site is required so that the user (and IDEs) even know
that you need to use the special syntax at the call-site.

> So to the extent there is a consensus, equality, stringifying, and a hashcode 
> (which we don't have yet, but will need in the future for some things I 
> suspect) seem to be the rough expected defaults.

I'm just skeptical whether the default __toString() is ever useful. I
can see an argument for it for quick debugging in languages that don't
provide something like var_dump(). In PHP this seems much less useful.
It's impossible to provide a default implementation that works
everywhere (or pretty much anywhere, even).

Equality is already included. Hashing should be added separately, and
probably not just to data classes.

> > * In the future, it should be possible to allow using data classes in
> > `SplObjectStorage`. However, because hashing is complex, this will be
> > postponed to a separate RFC.
>
> Would data class properties only be allowed to be other data classes, or 
> could they hold a non-data class?  My knee jerk response is they should be 
> data classes all the way down; the only counter-argument I can think of it 
> would be how much existing code is out there that is a "data class" in all 
> but name.  I still fear someone adding a DB connection object to a data class 
> and everything going to hell, though. :-)

Disallowing ordinary by-ref objects is not trivial without additional
performance penalties, and I don't see a good reason for it. Can you
provide an example on when that would be problematic?

Ilija

Reply via email to