Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

Stephen Reay Wed, 28 Feb 2024 05:51:39 -0800

> On 28 Feb 2024, at 06:17, Larry Garfield <la...@garfieldtech.com> wrote:
> 
> On Sun, Feb 25, 2024, at 10:16 PM, Rowan Tommins [IMSoP] wrote:
>> [Including my full previous reply, since the list and gmail currently 
>> aren't being friends. Apologies that this leads to rather a lot of 
>> reading in one go...]
> 
> Eh, I'd prefer a few big emails that come in slowly to lots of little emails 
> that come in fast. :-)
> 
>>> On 21/02/2024 18:55, Larry Garfield wrote:
>>>> Hello again, fine Internalians.
>>>> 
>>>> After much on-again/off-again work, Ilija and I are back with a more 
>>>> polished property access hooks/interface properties RFC.
>>> 
>>> 
>>> Hello, and a huge thanks to both you and Ilija for the continued work 
>>> on this. I'd really like to see this feature make it into PHP, and 
>>> agree with a lot of the RFC.
>>> 
>>> 
>>> My main concern is the proliferation of things that look the same but 
>>> act differently, and things that look different but act the same:
> 
> *snip*
> 
>>> - a and b are both what we might call "traditional" properties, and 
>>> equivalent to each other; a uses legacy syntax which we haven't 
>>> removed for some reason
> 
> I don't know why we haven't removed `var` either.  I can't recall the last 
> time I saw it in real code.  But that's out of scope here.
> 
> *snip*
> 
>> I think there's some really great functionality in the RFC, and would 
>> love for it to succeed in some form, but I think it would benefit from 
>> removing some of the "magic".
>> 
>> 
>> Regards,
>> 
>> -- 
>> Rowan Tommins
>> [IMSoP]
> 
> 
> I'm going to try and respond to a couple of different points together here, 
> including from later in the thread, as it's just easier.
> 
> == Re, design philosophy:
> 
>> In C#, all "properties" are virtual - as soon as you have any 
>> non-default "get", "set" or "init" definition, it's up to you to declare 
>> a separate "field" to store the value in. Swift's "computed properties" 
>> are similar: if you have a custom getter or setter, there is no backing 
>> store; to add behaviour to a "stored property", you use the separate 
>> "property observer" hooks.
>> 
>> Kotlin's approach is philosophically the opposite: there are no fields, 
>> only properties, but properties can access a hidden "backing field" via 
>> the special keyword "field". Importantly, omitting the setter doesn't 
>> make the property read-only, it implies set(value) { field = value }
> 
> A little history here to help clarify how we ended up where we are: The 
> original RFC as we designed it modeled very closely on Swift, with 4 hooks.  
> Using get/set at all would create a virtual property and you were on your 
> own, while the beforeSet/afterSet hooks would not.  We ran that design by 
> some PHP Foundation sponsors a year ago (I don't actually know who, Roman did 
> it for us), and the general feedback was "we like the idea, but woof this is 
> complicated with all these hooks and having to make my own backing property 
> for all these little things.  Couldn't this be simplified?"  We thought a bit 
> more, and I off-handedly suggested to Ilija "I mean, would it be possible to 
> just detect if a get/set hook is using a backing store and make it 
> automatically?  Then we could get rid of the before/after hooks."  He gave it 
> a quick try and found that was straightforward, so we pivoted to that 
> simplified version.  We then realized that we had... mostly just recreated 
> Kotlin's design, so shrugged happily and went on with life.
> 
> As noted in an earlier email, C#, Kotlin, and Swift all have different 
> stances on the variable name for the incoming value.  We originally modeled 
> on Swift so had that model (optional newVal name), and also because we liked 
> how compact it was.  When we switched to the simplified, incidentally 
> Kotlin-esque approach, we just kept the optional variable as it works.
> 
> I think where that ended up is pretty nice, personally, even if it is not a 
> direct map of any particular other language.
> 
> == Re asymmetric typing:
> 
> This is capability already present today if using a setter method.  
> 
> class Person {
>    private $name;
> 
>    public function setName(UnicodeString|string $name)
>    {
>        $this->name  =  $value  instanceof UnicodeString ? $value : new  
> UnicodeString($value);         
>    }
> }
> 
> And widening the parameter type in a child class is also entirely legal.  As 
> the goal of the RFC is, essentially, "make most common getter/setter patterns 
> easy to add to a property without making an API-breaking method, so people 
> don't have to add redundant just-in-case getters and setters all the time," 
> covering an easy-to-cover use case seems like a good thing to do.  
> 
> It also ties into the question of the explict/implicit name, for the reason 
> you mentioned earlier (unspecified means mixed), not by intent.  More on that 
> in another section.
> 
> == Re virtual properties:
> 
> Ilija and I talked this through, and there's pros and cons to a `virtual` 
> keyword.  Ilija also suggested a `backed` keyword, which forces a backed 
> property to exist even if it's not used in the hook itself.
> 
> * Adding `virtual` adds more work for the developer, but more clarity.  It 
> would also mean $this->$propName or $this->{__PROPERTY__} would work "as 
> expected", since there's no auto-detection for virtual-ness.  On the 
> downside, if you have a could-be-virtual property but never actually use the 
> backing value, you have an extra backing value hanging around in memory that 
> is inaccessible normally, but will still show up in some serialization 
> formats, which could be unexpected.  If you omit one of the hooks and forget 
> to mark it virtual, you'll still get the default of the other operation, 
> which could be unexpected.  (Mostly this would be for a virtual-get that 
> accidentally has a default setter because you forgot to mark it `virtual`.)
> * Doing autodetection as now, but with an added "make a backing value anyway" 
> flag would resolve the use case of "My set hook just calls a method, and that 
> method sets the property, but since the hook doesn't mention the property it 
> doesn't get created" problem.  It would also allow for $this->$propName to 
> work if a property is explicitly backed.  On the flipside, it's one more 
> thing to think about, and the above example it solves would be trivially 
> solved by having the method just return the value to set and letting the set 
> hook do the actual write, which is arguably better and more reliable code 
> anyway.
> * The status quo (auto-detection based on the presence of $this->propName).  
> This has the advantage it "just works" in the 95% case, without having to 
> think about anything extra.  The downside is it does have some odd edge 
> cases, like needing $this->propName to be explicitly used.  
> 
> I don't think any is an obvious winner.  My personal preference would be for 
> status quo (auto-detect) or explicit-virtual always.  I could probably live 
> with either, though I think I'd personally favor status quo.  Thoughts from 
> others?
> 

I agree that a flag to make the field *virtual* (thus disabling the backing 
store) makes more sense than a flag to make it backed; It's also easier to 
understand when comparing hooked properties with regular properties 
(essentially, backed is the default, you have to opt-in to it being virtual). I 
don't think the edge cases of "auto" make it worthwhile just to not need 
"virtual". 

> == Re reference-get
> 
> Allowing backed properties to have a reference return creates a situation 
> where any writes would then bypass the set hook, and thus any validation 
> implemented there.  That is, it makes the validation unreliable.  A major 
> footgun.  The question is, do we favor caveat-emptor flexibility or 
> correct-by-construction safety?  Personally I always lead toward the latter, 
> though PHP in general is... schizophrenic about it, I'd say. :-)
> 
> At this point, we'd much rather leave it blocked to avoid the issue; it's 
> easier to enable that loophole in the future if we really want it than to get 
> rid of it if it turns out to have been a bad idea.
> 
> There is one edge case that *might* make sense: If there is no set hook 
> defined, then there's no set hook to worry about bypassing.  So it may be 
> safe to allow &get on backed properties IFF there is no set hook.  I worry 
> that is "one more quirky edge case", though, so as above it may be better to 
> skip for now as it's easier to add later than remove.  But if the consensus 
> is to do that, we're open to it.  (Question for everyone.)
> 

I don't have strong feeling about this, but in general I usually tend to prefer 
options that are consistent, and give power/options to the developer. If 
references are opt-in anyway, I see that as accepting the trade-offs. If a 
developer doesn't want to allow by-ref modifications of the property, why would 
they make it referenceable in the first place? This sounds a bit like 
disallowing regular public properties because they might be modified outside 
the class - that's kind of the point, surely.

> == Re 
> 
> == Re arrays
> 
>> Regarding arrays, have you considered allowing array-index writes if
> an &get hook is defined? i.e. "$x->foo['bar'] = 42;" could be treated
> as semantically equivalent to "$_temp =& $x->foo; $_temp['bar'] = 42;
> unset($_temp);"
> 
> That's already discussed in the RFC:
> 
>> The simplest approach would be to copy the array, modify it accordingly, and 
>> pass it to set hook. This would have a large and obscure performance 
>> penalty, and a set implementation would have no way of knowing which values 
>> had changed, leading to, for instance, code needing to revalidate all 
>> elements even if only one has changed.
> 
> Unless we were OK with that bypassing the set hook entirely if defined, 
> which, as noted above, means any safety guarantees provided by a set hook are 
> bypassed, leading to untrustworthy code.
> 
> == Re hook shorthands and return values
> 
> Ilija and I have been discussing this for a bit, and we've both budged a 
> little. :-)  Here's our counter-proposal:
> 
> - Drop the "top level" shorthand, for get-only hooks.
> - Keep the => shorthand for the get hook itself.
> - For a set hook, the {} form has no return value; set the value yourself 
> however you want.
> - For a set hook, the => form implies a backed value and will set the 
> property to whatever value that evaluates to.
> 
> So these are equivalent:
> 
> public $foo { set { $this->foo = $value; } }
> public $foo { set => $value; }
> 
> These are equivalent:
> 
> public string $foo {
>  get {
>    return strtoupper($this->foo);
>  }
> }
> public string $foo { get => strtoupper($this->foo); }
> 
> And this goes away:
> 
> public string $foo => strtoupper($this->foo);
> 
> That covers the common cases with an arrow-function-like syntax that behaves 
> as you'd expect (it returns things), and allows a longer version with 
> arbitrarily complex logic if desired.  It also means that each syntax variant 
> does mean something importantly different.
> 
> Would that be an acceptable compromise?  (Question for everyone.)
> 

I think the examples given are clear, and the lack of the top-level short 
closure-esque version makes it more obvious. Forgive me, I must have missed 
some of the previous comments - is there a reason the 'full' setter can't 
return a value, for the sake of consistency? I understand that you don't want 
"return to set" to be the *only* option, for the sake of e.g. change/audit 
logging type functionality (i.e. set and then some action to record that the 
change was made), but it seems a little odd and inconsistent to me that the 
return value of a short closure would be used when the return value of the long 
version isn't. This isn't really a major issue, I'm just curious if there was 
some explanation about it?

> == Re the $value variable in set
> 
> Honestly, Rowan's earlier point here is the strongest argument in favor for 
> me of the current RFC approach.  Anywhere else in PHP, something that looks 
> like a parameter and has no type, like `($param)`, means its type is `mixed`. 
>  It would be weird and confusing to be different here.  That's above and 
> beyond the issue of forcing people to retype something obvious every time.  
> (I cite again, recent PHP's trend toward removing needless boilerplate, which 
> is very good.)  Requiring that the type be specified, for consistency, makes 
> little sense if the type is not allowed to vary.  You're just repeating a 
> string from earlier on the same line, for no particular benefit.
> 
> I genuinely don't understand the pushback on $value.  It's something you 
> learn once and never have to think about again.  It's consistent.
> 
> Ilija jokingly suggested making it always $value, unconditionally, and 
> allowing only the type to be specified if widening:
> 
> public int $foo { set(int|float) => floor($value); }
> 
> Though I suspect that won't go over well, either. :-)
> 
> So what makes the most sense to me is to keep $value optional, but IF you 
> specify an alternate name, you must also specify a type (which may be wider). 
>  So these are equivalent:
> 
> public int $foo { set (int $value) => $value + 1 }
> public int $foo { set => $value + 1 }
> 
> And only those forms are legal.  But you could also do this, if the situation 
> called for it:
> 
> public int $foo { set(int|float $num) => floor($num) + 1; }
> 
> This "all or nothing" approach seems like it strikes the best balance, gives 
> the most flexibility where needed while still having the least redundancy 
> when not needed, and when a name/type is provided, its behavior is the same 
> as for a method being inherited.
> 
> Does that sound acceptable?  (Again, question for everyone.)
> 

My only question with this is the same as I had in an earlier reply (and I'm 
not sure it was ever answered directly?), and you allude to this yourself: 
everywhere *else*, `($var)` means a parameter with type `mixed`. Why is the 
type *required* here, when you've specifically said you want to avoid 
boilerplate? If we're going to assume people can understand that `(implicit 
property-type $value) is implicit, surely we can also assume that they will 
understand "specifying a parameter without a type" means the parameter has no 
type (i.e. is `mixed`).

Again, for myself I'd be likely to type it (or regular parameters, properties, 
etc) as `mixed` if that's what I want *anyway*, but the inconsistency here 
seems odd, unless there's some until-now unknown drive to deprecate type-less 
parameters/properties/etc.


> The alternative that gives the most future-flexibility is to do neither: The 
> variable is called $value, period, you can't change it, and you can't change 
> the type, either.  There is no () after set, ever.  Punt both of those to a 
> later follow-up.  I'd prefer to include both now, but including neither now 
> is the next-safer option.
> 
> 
> ## Regarding $field
> 
> Sigh, now y'all like it. :-P  Most of the feedback on this has been negative, 
> so I'm inclined to leave it out at this point, unless there's a major swing 
> in feedback to bring it back.  But the RFC seems more likely to pass without 
> it than with right now.
> 
> --Larry Garfield
> 


Cheers


Stephen
Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2

Reply via email to