Re: [PHP-DEV] [RFC] Pipe Operator (again)

Ilija Tovilo Thu, 10 Apr 2025 16:06:34 -0700

Hi Larry

Sorry again for the delay.

On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <[email protected]> wrote:
>
> * A new iterable API is absolutely a good thing and we should do it.
> * That said, we *need* to split Sequence, Set, and Dictionary into separate 
> types.  We are the only language I reviewed that didn't have them as separate 
> constructs with their own APIs.
> * The use of the same construct (arrays and iterables) for all three types is 
> a fundamental and core flaw in PHP's design that we should not double-down 
> on.  It's ergonomically awful, it's bad for performance, and it invites major 
> security holes.  (The "Drupageddon" remote exploit was caused by using an 
> array and assuming it was sequential when it was actually a map.)
>
> So while I want a new iterable API, the more I think on it, the more I think 
> a bunch of map(iterable $it, callable $fn) style functions would not be the 
> right way to do it.  That would be easy, but also ineffective.
>
> The behavior of even basic operations like map and filter are subtly 
> different depending on which type you're dealing with.  Whether the input is 
> lazy or not is the least of the concerns.  The bigger issue is when to pass 
> keys to the $fn; probably always in Dict, probably never in Seq, and 
> certainly never in Set (as there are no meaningful keys).  Similarly, when 
> filtering a Dict, you would want keys preserved.  When filtering a Seq, you'd 
> want the indexes re-zeroed.  (Or to seem like it, given or take 
> implementation details.)  And then, yes, there's the laziness question.
>
> So we'd effectively want three different versions of map(), filter(), etc. if 
> we didn't want to perpetuate and further entrench the design flaw and 
> security hole that is "sequences and hashes are the same thing if you 
> squint."  And... frankly I'd probably vote against an interable/collections 
> API that didn't address that issue.

I fundamentally disagree with this assessment. In most languages,
including PHP, iterators are simply a sequence of values that can be
consumed. Usually, the consumer should not be concerned with the data
structure of the iterated value, this is abstracted away through the
iterator. For most languages, both Sequences and Sets are translated
1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
Dictionaries usually result in a tuple, combining both the key and
value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
is a bit different in that all iterators require a key. Semantically,
this makes sense for both Sequences (which are logically indexed by
the elements position in the sequence, so Sequence<T> => Iterator<int,
T>) and Dicts (which have an explicit key, so Dict<T, U> =>
Iterator<T, U>). Sets don't technically have a logical key, but IMO
this is not enough of a reason to fundamentally change how iterators
work. A sequential number would be fine, which is also what yield
without providing a key does. If we really wanted to avoid it, we can
make it return null, as this is already allowed for generators.
https://3v4l.org/LvIjP

The big upside of treating all iterators the same, regardless of their
data source is 1. the code becomes more generic, you don't need three
variants of a value map() functions when the one works on all of them.
And 2. you can populate any of the data structures from a generic
iterator without any data shuffling.

$users
    |> Iter\mapKeys(fn($u) => $u->getId())
    |> Iter\toDict();

This will work if $users is a Sequence, Set or existing Dict with some
other key. Actually, it works for any Traversable. If mapKeys() only
applied to Dict iterators you would necessarily have to create a
temporary dictionary first, or just not use the iterator API at all.

> However, a simple "first arg" pipe wouldn't allow for that.  Or rather, we'd 
> need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, 
> callable $fn), and dictMap(iterable $it, callable $fn).  And the same split 
> for filter, and probably a few other things.  That seems ergonomically 
> suspect, at best, and still wouldn't really address the issue since you would 
> have no way to ensure you're using the "right" version of each function. 
> Similarly, a dict version of implode() would likely need to take 2 
> separators, whereas the other types would take only one.
>
> So the more I think on it, the more I think the sort of iterable API that 
> first-arg pipes would make easy is... probably not the iterable API we want 
> anyway.  There may well be other cases for Elixir-style first-arg pipes, but 
> a new iterable API isn't one of them, at least not in this form.

After having talked to you directly, it seemed to me that there is
some confusion about the iterator API vs. the API offered by the data
structure itself. For example:

> $l = new List(1,2, 3);
> $l2 = $l |> map(fn($x) => $x*2);
>
> What is the type of $l2? I would expect it to be a List, but there's currently
> no way to write a map() that statically guarantees that. (And that's before we
> get into generics.)

$l2 wouldn't be a List (or Sequence, to stick with the same
terminology) but an iterator, specifically Iterator<int, int>. If you
want to get back a sequence, you need to populate a new sequence from
the iterator using Iter\toSeq(). We may also decide to introduce a
Sequence::map() method that maps directly to a new sequence, which may
be more efficient for single transformations. That said, the nice
thing about the iterator API is that it generically applies to all
data structures implementing Traversable. For example, an Iter\max()
function would not need to care about the implementation details of
the underlying data structure, nor do all data structures need to
reimplement their own versions of max().

> Which brings us then to extension functions.

I have largely changed my mind on extension functions. Extension
functions that are exclusively local, static and detached from the
type system are rather useless. Looking at an example:

> function PointEntity.toMessage(): PointMessage {
>     return new PointMessage($this->x, $this->y);
> }
>
> $result = json_encode($point->toMessage());

If for some reason toMessage() cannot be implemented on PointEntity,
there's arguably no benefit of $point->toMessage() over `$point |>
PointEntityExtension\toMessage()` (with an optional import to make it
almost as short). All the extension really achieves is changing the
syntax, but we would already have the pipe operator for this.
Technically, you can use such extensions for untyped, local
polymorphism, but this does not seem like a good approach.

function PointEntity.toMessage(): PointMessage { ... }
function RectEntity.toMessage(): RectMessage { ... }

$entities = [new Point, new Rect];

foreach ($entities as $e) {
    $e->toMessage(); // Technically works, but the type system is
entirely unaware.
    takesToMessage($e); // This breaks, because Point and Rect don't
actually implement the ToMessage interface.
}

Where extensions would really shine is if they could hook into the
type system by implementing interfaces on types that aren't in your
control. Rust and Swift are two examples that take this approach.

implement ToMessage for Rect { ... }

takesToMessage(new Rect); // Now this actually works.

However, this becomes even harder to implement than extension
functions already would. I won't go into detail because this e-mail is
already too long, but I'm happy to discuss it further off-list. All
this to say, I don't think extensions will work well in PHP, but I
also don't think they are necessary for the iterator API.

Regards,
Ilija

Re: [PHP-DEV] [RFC] Pipe Operator (again)

Reply via email to