Re: [PHP-DEV] A little syntactic sugar on array_* function calls?

Nikita Popov Fri, 28 May 2021 07:31:49 -0700

On Fri, May 28, 2021 at 3:11 AM Mike Schinkel <m...@newclarity.net> wrote:

> > On May 26, 2021, at 7:44 PM, Hendra Gunawan <the.liquid.me...@gmail.com>
> wrote:
> >
> > Hello.
> >
> >>
> >> Yes, but Nikita wrote this note about technical limitations at the
> bottom of the repo README:
> >>
> >> Due to technical limitations, it is not possible to create mutable APIs
> for
> >> primitive types. Modifying $self within the methods is not possible (or
> >> rather, will have no effect, as you'd just be changing a copy).
> >>
> >
> > If it is solved, this is a great accomplishment for PHP. But I think
> > scalar object is not going anywhere in the near future. If you are not
> > convinced, please take a look
> > https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181
> .
>
> Nikita's comment actually causes me more questions, not fewer.
>
> Nikita says "We need to know that $a[$b][$c is an array in order to
> determine that the call should be performed by-reference. However, we
> already need to convert $a, $a[$b] and $a[$b][$c] into references before we
> know about that."
>
> How then are we able to do the following?:
>
> $a[$b][$c][] = 1;
>

In this case, we're clearly performing a write operation on the array. If
you want to know the technical details, the compiler will convert this into
a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is
for "write", which will perform all the necessary special handling, such as
copy-on-write separation and auto-vivification.

How also can we do this:
>
> byref($a[$b][$c]);
> function byref(&$x) {
>     $x[]= 2;
> }
>
> See https://3v4l.org/aPvTD <https://3v4l.org/aPvTD>
>

This is a more complex case. In this case the compiler doesn't know in
advance whether the argument is passed by value or by reference. What
happens here is:

1. INIT_FCALL determines that we're calling byref().
2. CHECK_FUNC_ARG for the first arg determines that this argument is passed
by-reference for this function.
3. FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R or
to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined.

I assume that in both my examples $a[$b][$c] would be considered an
> "lvalue"[1] and can be a target of assignment triggered by either the
> assignment operator or calling the function and passing to a by-ref
> parameter.
>
> [1]
> https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values
>
> So is there a reason that -> on an array could not trigger the same?  Is
> Nikita saying that the performance of those calls performed by-reference
> would not matter because they are always being assigned, at least in the
> former case, but to do so with array expressions would be problematic?
> (Ignoring there is no code in the wild that currently uses the -> operator,
> or does that matter?)
>

Note that the byref($a[$b][$c]) case only works because we know which
function is being called at the time the argument is passed. If you have
$a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or
by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the
argument by-value or by-reference. But we can only know that once we have
already evaluated $a[$b][$c] and found out that it is indeed an array.

The only way around this is to *always* perform a for-write fetch of
$a[$b][$c], even though we don't know that the end result is going to be an
array. However, doing so would pessimize the performance of code operating
on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch
$some_huge_shared_array for write, we'll be required to perform a full
duplication of the array in preparation for a possible future write. If it
turns out that $some_huge_shared_array[0] is actually an object, or that
$some_huge_shared_array[0] is an array and the performed operation is
by-value, then we have performed this copy unnecessarily.

I don't believe this is acceptable.

I ask honestly to understand, and not as a rhetorical question.
>
> Additionally, if the case of updating an array variable is not a problem
> but updating an array expression is a problem then why not just limit the
> -> operator to only work on expressions for immutable methods and require
> variables for mutable methods?  I would think should be easy enough to
> throw an error for those specific "methods" that would be mutable, such as
> shift() and unshift() if $a[$b][$c]->shift('foo') were called?
>

There are externalities associated even with the simple $x->foo() case,
though they are less severe. They primarily involve reduced ability to
analyze code in opcache.

In either case, this limitation does not seem reasonable to me from a
language design perspective. If $a->push($b) works, then $a[$k]->push($b)
can reasonably be expected to work as well.

> Or maybe just completely limit using the -> operator on array variables.
> Don't work on any array expressions for consistency. There is already
> precedence in PHP for operators that work on variables and not on
> expressions:  ++, --, and &.
>
> IF we can get a thumbs up from Nikita that one of these would actually be
> possible then I think the next step should be to write up a list of
> proposed array methods that would be implemented to support the -> operator
> with arrays and put them in an RFC, and to flesh out any edge cases.
>

The only correct way to resolve this issue is to not support mutable
operations.

I don't think there's much need for mutable operations. sort() and
shuffle() would be best implemented by returning a new array instead.
array_push() is redundant with $array[]. array_shift() and array_unshift()
should never be used. array_pop() and array_splice() are the only sensible
mutable array methods that come to mind, and I daresay we can do without
them.

Regards,
Nikita

Re: [PHP-DEV] A little syntactic sugar on array_* function calls?

Reply via email to