Re: [PHP-DEV] RFC Proposal: Readonly Structs in PHP

Larry Garfield Tue, 12 Sep 2023 12:11:22 -0700

On Fri, Sep 8, 2023, at 1:12 PM, Lanre Waju wrote:
> Dear PHP Internals,
>
> I am writing to propose a new feature for PHP that introduces the 
> concept of structs. This feature aims to provide a more concise and 
> expressive way to define and work with immutable data structures. Below 
> is a detailed description of the proposed syntax, usage, and behavior.
>
> Syntax
>
> struct Data
> {
>      string $title;
>      Status $status;
>      ?DateTimeImmutable $publishedAt = null;
> }
> The Data struct is essentially represented as a readonly class with a 
> constructor as follows:
>
>
> readonly class Data
> {
>      public function __construct(
>          public string $title,
>          public Status $status,
>          public ?DateTimeImmutable $publishedAt = null,
>      ) {}
> }
> Assertions
> The Data struct will always be readonly.
> It has no methods besides the constructor.
> Constructors
> The Data struct can be constructed in three different ways, each of 
> which allows for named or positional arguments, which can be mixed:
>
> 1.1 Class like
> $data = new Data('title', Status::PUBLISHED, new DateTimeImmutable());
>
> 1.2 Class like (Named Syntax)
> $data = new Data(title: 'title', status: Status::PUBLISHED, publishedAt: 
> new DateTimeImmutable());
>
> 2.1 Proposed struct initialization syntax (Positional Arguments)
> $data = Data{'title', Status::PUBLISHED, new DateTimeImmutable()};
>
> 2.2 Proposed struct initialization syntax (Named Syntax)
> $data = Data{title: 'title', status: Status::PUBLISHED, publishedAt: new 
> DateTimeImmutable()};
>
> 3.1 Anonymous Struct (Named Arguments)
>
> $data = struct {
>      string $title;
>      Status $status;
>      ?DateTimeImmutable $publishedAt = null;
> }('title', Status::PUBLISHED, new DateTimeImmutable());
> 3.2 Anonymous Struct (Named Arguments - Named Syntax)
>
> $data = struct {
>      string $title;
>      Status $status;
>      ?DateTimeImmutable $publishedAt = null;
> }(title: 'title', status: Status::PUBLISHED, publishedAt: new 
> DateTimeImmutable());
> Nesting
> The proposed feature also supports nesting of structs. For example:
>
>
> final class HasNestedStruct
> {
>      NestedStruct {
>          string $title;
>          Status $status;
>          ?DateTimeImmutable $publishedAt = null;
>      };
>
>      public function __construct(
>          public string $string,
>          public Data $normalStruct,
>          public NestedStruct $nestedStruct = NestedStruct{'title', 
> Status::PUBLISHED, new DateTimeImmutable()},
>          public struct InlineNamed { int $x} $inlineNamed = {x: 1},
>          public { int $x, int $y} $inlineAnonymous = {x: 1, y: 2},
>      ) {}
> }
> This proposal aims to enhance the readability and maintainability of 
> code by providing a more concise and expressive way to work with 
> immutable data structures in PHP.
> I believe this feature will be a valuable addition to the language as it 
> not only opens the door for future enhancements (eg. typed json 
> deserialization, etc.), but should also help reduce reliance on arrays 
> by providing a more expressive alternative.
>
> Your feedback and suggestions are highly appreciated, and we look 
> forward to discussing this proposal further within the PHP internals 
> community.
>
> Sincerely
> Lanre


As I have stated in the past, I am firmly opposed to anemic structs.  They 
offer no benefit, much confusion, and more work for future RFCs.

The core concept -- that service objects and data objects are separate 
creatures that should not be comingled -- I fully agree with and advocate for.  
If I were writing PHP from scratch today, I would probably design it with 
separate constructs, or take a cue from Go/Rust and eliminate classes all 
together per se, as they just confuse matters.  However, we are dealing with 
PHP as it exists today, and an entirely separate limited construct just doesn't 
make sense.

I also want to make clear that I am 1000% in favor of structured, typed data.  
The use of associative arrays as a pseudo data structure is the weakest part of 
PHP, and the more we can move people away from that towards more formally typed 
data, the better.  For that reason, making it trivial to cast between an 
associative array and a structured object (as a few others in the thread have 
suggested) is a *bad* feature, because it further reinforces the idea that an 
associative array is "just as good" as making a defined type.  This is simply 
flat out false, and we should avoid language features that pretend that it is 
true.

That said, as of PHP 8, we already have perfectly good struct-ish data 
structure: Classes with promoted properties and named arguments.  As of PHP 
8.2, the entire class can be declared readonly with a single keyword.  For 95% 
of use cases, this is completely adequate as a struct-like structure:

readonly class Person
{
     public function __construct(
         public string $first,
         public Status $last,
         public ?DateTimeImmutable $birthday = null,
     ) {}
}

$p = new Person(first: 'Larry', last: 'Garfield');

So where does it fall short?

1. The proposal above suggests it's that it allows methods.  Why is that an 
issue?  Why are methods a problem on a data-centric object?  This is never 
explained, and I don't believe it to be true.  While methods that call out to 
other service objects would definitely be bad juju, a fullname() method on the 
above class poses no problems whatsoever.  There is no theoretical purity being 
protected by disallowing methods.  By the same logic, would structs also forbid 
property hooks, assuming those pass?  I would hope not, as data objects are 
where those are most useful.  In fact, I'd go a step further and note that a 
readonly struct that disallows methods *precludes* the "with-er" style of 
evolving an object, so if you want "the same thing but with this one change", 
you have to completely recreate a new struct from scratch.  This is a worse 
experience in every way.

2. The proposal above suggests that it's because the `new` keyword is needed, 
and proposes both positional and named function-esque syntax, making it look 
more like Kotlin or Rust where there is no `new` keyword and the class name is 
itself the constructor.  I will agree that `new` is clumsy in many cases, 
particularly in compound expressions, but that's not an issue unique to 
data-centric objects.  If we were to come up with some alternate constructor 
invocation to make it easier for data-centric objects, it would be equally 
useful on non-data objects as well.  There's no reason to make it specific to 
just data structs.  (I am also not certain if the parser could even handle 
that, since functions and classes are in a different keyspace currently so if 
both a class foo and function foo are defined, `foo()` is ambiguous.)

3. The proposal suggests nested the ability to have nested struct definitions.  
I can see where this is useful, certainly.  However... I can also see where 
it's useful on service objects, too.  Many languages have such a feature, often 
called "inner classes," and it works just fine on service objects as well as 
data objects.  Inner classes would be an interesting feature in itself that 
would be worth its own RFC (I won't guarantee that I'd support it, depending on 
the details, but I am quite open to considering it), but there's no good reason 
to limit that functionality to just data classes.

4. The proposal implies that structs should be always readonly.  As noted 
above, a readonly class is trivial to define now.  Moreover, while I am an 
outspoken proponent of immutable data structures they are not appropriate in 
all cases.  PSR-14 events, for example, are deliberately mutable because, given 
PHP's design, making them immutable would have required a lot of extra work 
from anyone writing a listener for very little benefit.  Entities are another 
example of a data-object that logically needs to be mutable.  Mutable 
data-centric product types have their place, and this approach would preclude 
that.

5. MWOP suggested in a reply that allowing a struct to conform to a struct 
definition structurally by the properties it has would be useful.  Potentially 
yes.  However, interface properties, part of the hooks RFC, would get us to 
almost the same place without the weird world-splitting between structs and 
classes.

6. When dealing with a mutable data value, objects pass by handle (feels like 
reference even if it's not), but data feels like it should pass by value.  
Valid!  This is a long-standing gripe, and the growth of with-er style value 
objects (PSR-7 et al) is in a large part to avoid that risk of "spooky action 
at a distance."  However... the above proposal does not address this at all!

I would argue that point 6 is the only valid argument for a separate construct 
from classes as they already exist.  But there is no need to create a whole 
other construct (a very significant implementation lift) to achieve that.  All 
that would need is a flag/marker on the class to indicate that it should use 
data-like passing semantics.  Kotlin has a good example here, where you can 
declare a class a "data" class by just adding the `data` keyword.  That has a 
number of implications in Kotlin (many of which are not relevant for us), but 
in our case it would mean either to pass the object by value, or to 
automatically clone it every time it is passed.  (The two would be almost the 
same to the end user, but likely have different implementation challenges.  I 
cannot speak to what those would be personally, but it's an implementation 
detail not relevant for now.)

That very small change, when combined with all of the other improvements to the 
language in recent years, gives us all the benefits of data-centric structures 
without any of the downsides of a completely new construct.  It would also 
allow the developer to opt-in to the class being readonly or not, as the 
situation requires.  The downsides of a new construct include:

1. It would either have to be a new zval type, which is a ton of work, or built 
on classes, in which case you're fighting against all of the stuff classes do.

2. Which stuff that classes do should be supported by the very-similar syntax?  
Methods?  Attributes?  Can you clone it?  Do you get a __clone() override if 
you do?  Are traits supported?  How does equality work?  I can see an argument 
for where product types (which is what we're talking about) would benefit from 
all of the above.  So we either cripple structs without useful features, or it 
becomes a lot of work to end up with "objects that pass funny."  We can get 
"objects that pass funny" with a lot less effort with just a `data` keyword 
flag.

3. If structs are entirely separate from objects, then any time we add a new 
feature to objects we'll have to debate, again, if that feature should be added 
to structs as well.  And if so, we're looking at more work for the RFC 
implementer for very little gain.  Or, we'll add a feature to structs (like 
inner classes) and then ask for it on classes, too, and again have double the 
work and double the debate.

4. The Reflection API is complicated enough as is, without having to deal with 
a whole other type of type.  As someone who maintains a serializer, that would 
be a lot of work for me to support, with no actual benefit.

In short, there's only two versions of structs that could realistically end up 
existing: Crippled in some way, and "objects that pass funny."

So if what we really want are objects that pass by value, let's just implement 
by-value opt-in objects and be done with it.  It's much less work, much more 
powerful, and avoids many more debates in the future.

--Larry Garfield

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC Proposal: Readonly Structs in PHP

Reply via email to