Re: [PHP-DEV] [Pre-RFC Discussion] User Defined Operator Overloads (again)

Mike Schinkel Tue, 17 Sep 2024 02:16:01 -0700

> On Sep 17, 2024, at 1:37 AM, Jordan LeDoux <jordan.led...@gmail.com> wrote:
> On Mon, Sep 16, 2024 at 9:35 PM Mike Schinkel <m...@newclarity.net> wrote:
> 
> Yes, if constraints of the nature I propose below are adopted.
> 
> The biggest problem I have with operator overloads is that — once added — all 
> code could potentially be "infected" with operator overloads. However, if the 
> developer *using* an operator overload could instead opt-in to using them, in 
> context, then I would flip my opinion and I would begin to support them.  
> 
> What might opt-in look like?  I propose two (2) mechanisms of which each 
> would be useful for different use-cases. As such I do not see these two as 
> competing but instead would expect adding both to be preferable:
> 
> 1. Add a pair of sigils to enclose any expression that would need to support 
> userland operator overloading. This would allow a developer to isolate just 
> the expression that needs to use operator overloading. I propose {[...]} for 
> this, but feel free to bikeshed sigils. Using an example from the RFC, here 
> is what code might look like:
> 
> $cnum1 = new ComplexNumber(1, 2);
> $cnum2 = new ComplexNumber(3, 4);
> $cnum3 = {[ $cnum1 * $cnum2 ]};               // Uses operator operloading 
> sigils
> echo $cnum3->realPart.' + '.$cnum3->imaginaryPart.'i';
> 
> 2. For when using `{[...]}` would be annoying because it would be needed in 
> so many places, PHP could also add support for an attribute. e.g. 
> `#[OperatorOverloads(Userland:true)]`. This attribute would apply to 
> functions, methods, classes, enums, (other?) and indicates that operator 
> overloads can be present anywhere in the body of the decorated structure. I 
> included `Userland:true` as an indicator to a reader that this only applies 
> to userland operator overloads and that built-in ones like in GMP and 
> anywhere else would not need to be opted into, but that parameter could of 
> course be dropped if others feel it is not needed. Again, feel free to 
> bikeshed attribute name and/or parameters.
> 
> #[OperatorOverloads(Userland:true)]
> function SprintProductOfTwoComplex(ComplexNumber $cnum1, ComplexNumber 
> $cnum2)string {
>   $cnum3 = $cnum1 * $cnum2;
>   return sprintf("%d + %di", $cnum3->realPart, $cnum3->imaginaryPart);
> }
> 
> If this approach were included in the RFC then it would also ensure there is 
> no possibility of BC breakage. BC breakage which would certainly be an edge 
> case but I can envision it would be possible,e specially where newer 
> instances incorporating operator overloads are passed to functions that did 
> not have parameters type hinted but were not intend to be used with operator 
> overloads resulting in subtle potential breakage. 
> 
> This argument is also consistent with the argument people had about not 
> allowing default values to be generically used in calls to the function 
> function. Their claim was that developers who did not write their code with 
> the intention of exposing defaults should not have their defaults exposed. 
> Similarly developers that do not write their code to enable operator 
> overloads should not be used with  userland operator overloads unless they 
> explicitly allow it, especially as they may not have have tested code with 
> operator overloads.
> 
> Anyway, that is my two cents worth. 
> 
> TL;DR?  I argue that PHP should operator overloads but ONLY if there is a 
> mechanism that requires the user of expressions that call overloaded 
> operators to explicitly opt-in to their use.
> 
> -Mike
> 
> 
> This is interesting, as I've never seen this in any language I researched as 
> part of operator overloading, and also was never given this feedback or 
> anything similar by anyone who provided feedback before.


If all language features required prior art, there would never be innovation in 
programming languages. So for anything that currently exists, there was always 
a first language that implemented it. 

Of course when there is prior art we can use the heuristic of "All these have 
done it before so it must be a good idea."  But lack of prior art should not be 
the reason to dismiss something, it should be evaluated on its merits.

> My initial reaction is that I do not understand how this is any better than 
> parameter typing. If you do not allow any objects into the scope you are 
> using operators, wouldn't that be the same as the kind of userland control 
> you are after? Or rather, how would it be substantially worse?

How would a developer know if they are using an object that has operators, 
unless they study all the source code or at least the docs (assuming there are 
good docs, which there probably are not?) 

It might be illustrative to explicitly call out different scenarios I envision 
in case some are not obvious.  

There are:

1. Internal projects that are almost entirely bespoke code, with an active team 
where the code is run by the code owners. Think a big company's internal 
operations.

2. Agencies that build web projects using frameworks and libraries for clients.

3. Smaller companies using frameworks and libraries for internal use, with a 
small team that may have many other duties, or those who outsource to 
contractors when they need things, and breakage for them is can be very painful.

4. Framework developers

5. Library developers

6. And probably a bunch of other scenarios, each slightly different.

Each of those scenarios have a different level of knowledge about the code they 
work on. I'd expect #2 & #3 to have the least knowledge of the code they use 
and would be most effected by other people's code doing things they do not 
expect.

I'd argue that #1 would have better knowledge of their code and would be less 
affected by other people's code, except they probably have a huge amount of 
bespoke code so one developer likely does not know what another developer is 
doing, and especially if they have teams that developer tools for other teams 
to use.

Lastly #4 and #5 likely know their codebases the best, but they may create 
footguns for developers in category #2 and #3 if the language allows them to. 
And vice-versa.

So back to your question "If you do not allow any objects into the scope you 
are using operators wouldn't that be the same as the kind of userland control 
you are after?" So I ask — How do I know if the objects I am using that were 
developed by others use operators or not? With free-reign userland operator 
overloads we would be required to dig into the source for the code written by 
others that we use to ensure I know if they have operators and how they work. 

OTOH with my suggestion, we will know because the code will crash when no 
opt-in is used.

Note, I refer to cases where code that calls code evolves, uses dynamic 
programming, and/or accepts mixed types. And I am especially talking about when 
developers create classes to wrap a built-in type and then implement operators, 
but add special cases to them such as a String() class that implements the 
concatenation operator but with a twist.

> Your second example even includes a function that only accepts a 
> `ComplexNumber` object. I presume in your example there that if the Attribute 
> was removed, the function would just always produce a fatal error, since that 
> is the behavior of objects when used with `*`.

Yes, that was the intention for the attribute, or lack of attribute in the case 
you describe.

> 
> What it appears to me your proposal does is transform working operator 
> overloads into fatal errors if the user-code does not "opt-in".

Correct.

> But any such code would never actually survive long, wouldn't it?

That is the feature, not a bug. 

> Without the opt-in, these objects would ALWAYS produce fatal errors (which is 
> what happens now),

Well, we do not have operator overloads right now. With operator overloads they 
could run without crashing but have subtle bugs.  

Note I am not referring to highly specific functions written for highly 
specific classes which is what I suspect you are envisioning. Based on your 
past comments those seem to be the areas you operate in, i.e. math-related. 

I am instead referring to code that is written to be generic but that ends up 
running code it did not intend to run because of edge cases that are exposed by 
userland operators.

> which would eventually show up in testing, QA, etc.

Eventually.  Assuming they have a good testing and QA process which many PHP 
projects do not. PHP is a least-common denominator language because it is one 
of the easiest to get started with. Many less experienced PHP developers do not 
have good testing and QA processes.

But even if they do have good testing and QA, the sooner the bugs appear the 
less likely they will get deployed.

> The developer would realize that they (presumably) were trying to do a math 
> operation on something they thought was only a numeric type, and then guard 
> against objects being passed into that context with control statements, 
> parameter types, etc.

Exactly. In my proposed concept they would rework their expressions to opt-in 
to using the overloaded operators once they ensure that they understand how the 
code operates.

> So it seems to me what this ACTUALLY guards against is developers who 
> inadvertently don't type-check their variables in code where the specific 
> type is relevant.

OR do not fully know the details of the types they are using.

OR they are using types that have been upgraded to now support operator 
overloading, but they do not realize that.

> After one round of testing, all of the code using operators would either 
> always allow objects and thus overloads, or never allow objects and thus not 
> use overloads.

That assumes they crash. I am concerned for when they do not crash but instead 
have subtle bugs.

> There shouldn't even be any existing code that would be affected, since any 
> existing code would need to currently allow objects in a context where 
> operators are used, which currently produces a fatal error 100% of the time, 
> (excepting internal classes which are mostly final anyway, and thus 
> unaffected by this proposal).

It is correct that no old code can call other old code and use operators on 
objects.

But *new* code could call old code and then that old code could be made to run 
operators without ever intending to be run in that manner.

> What is the situation where your suggestion is implemented, a developer does 
> NOT opt-in to overloads, and they avoid unexpected behavior without having to 
> change their existing code to fix fatal errors? I don't see how that is 
> possible.

In your hypothetical it appears you referred to only one developer. But where I 
see issues is when there are two or more developers; a producer of functions 
and a consumer of functions.

Situation where there is free-reign userland operator overloading:  Junior 
developer Joe is using Symfony and learns about this great new operator 
overload feature so decides to implement all the operators for all his objects, 
and now he wants to start passing his objects to Symphony code. Joe decides to 
be clever and implement "/" to concatenate paths strings together but doesn't 
type his properties, and he ends up passing them to a Symfony function that 
uses `/` for division, and his program crashes with very cryptic error 
messages.  He reports them to the Symfony developers, and it wastes a bunch of 
time for everyone until they finally figure out why it failed, because nobody 
every considered a developer would do such a thing.

Same scenario but with required opt-in. Joe does the same thing but this time 
he gets a very clear message that says "Symfony Widget does not support 
operator overloads."  He googles and quickly finds out that what that means and 
then goes to ask the Symfony team to support operator overloads. They can 
choose to either add support, or not, but it is up to them if they want to open 
the can of worms related to support that operator overloading might cause.

> Also, replying into a 3 year old reddit thread I linked to for reference is 
> not what I intended, however I want to highlight one other thing you 
> commented there but not here for some reason:
> 
> > To illustrate my point, imagine if we also allowed control structure 
> > overloads. If we had them we could no longer read code and know that an 
> > `if` is a branch and a `for` is a loop; either could be anything valid for 
> > any control structure. Talk about ambiguity! 
> 
> Indeed. I want to make sure that I have not been ambiguous after reading 
> this, because I found it somewhat troubling:
> 
> I am looking at writing an RFC for specific *operators* that are finite and 
> defined within the RFC. I am not proposing something that would allow control 
> structures to be altered (I don't even think that would be possible without 
> essentially rewriting the entire Zend Engine specifically to do it).
> 
> Operators are not control structures. Operators mutate the value or state of 
> a variable in a repeatable way, given the input states. There is not even a 
> generalized mechanism in my RFC for "arbitrary" overloads, and the compiler 
> was not implemented in a way that is generalized for it either. It allows 
> only exactly the operators that are part of the RFC, and each are handled 
> specifically and individually.

I was ONLY using control structures as a more extreme analogy to operator 
overloading to try to illustrate how — the more things you make configurable in 
a language — the more you allow the ground to shift beneath a developer's feet, 
so to speak.

An approach I use when trying to understand something that might be subtle is 
to ask myself what a more extreme example is that would be analogous and then I 
consider that.  

So I was not saying you proposed that, I was equating control structure 
overloading to operator overloading, but I explicitly meant control structure 
overloading would be a more extreme opening up of PHP than operator 
overloading.  

Clearly control structure overloading would be bad. I was trying to make the 
point that operator overloading would cause problems for the same reason, even 
if the problems would not be as extreme.

I am sorry that my wording did not make it clear that I was using an analogy, 
not referring to your RFC.

Anyway, as a closing for this email, I know you badly want operator overloading 
but there were enough people who disliked the idea to vote against it last time 
so — assuming my proposal could satisfy them too — it seems like a great 
compromise to give you true operator overloading with just a little extra 
boilerplate while at the same time allowing developers to limit the scope of 
operator overloads to just those function where they want to enable it. 

What's more, if after a few years we find out that my concerns really were for 
naught then a future RFC could open it up and remove the opt-in requirement. 

But one thing is certain, if we open up operator overloading completely one day 
one we could never go back to opt-in.

-Mike

Re: [PHP-DEV] [Pre-RFC Discussion] User Defined Operator Overloads (again)

Reply via email to