Re: [PHP-DEV] [RFC] Pipe Operator (again)

Rob Landers Wed, 09 Apr 2025 00:44:00 -0700

On Wed, Apr 9, 2025, at 01:29, Ilija Tovilo wrote:
> Hi Larry
> 
> Sorry again for the delay.
> 
> On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <la...@garfieldtech.com> wrote:
> >
> > * A new iterable API is absolutely a good thing and we should do it.
> > * That said, we *need* to split Sequence, Set, and Dictionary into separate 
> > types.  We are the only language I reviewed that didn't have them as 
> > separate constructs with their own APIs.
> > * The use of the same construct (arrays and iterables) for all three types 
> > is a fundamental and core flaw in PHP's design that we should not 
> > double-down on.  It's ergonomically awful, it's bad for performance, and it 
> > invites major security holes.  (The "Drupageddon" remote exploit was caused 
> > by using an array and assuming it was sequential when it was actually a 
> > map.)
> >
> > So while I want a new iterable API, the more I think on it, the more I 
> > think a bunch of map(iterable $it, callable $fn) style functions would not 
> > be the right way to do it.  That would be easy, but also ineffective.
> >
> > The behavior of even basic operations like map and filter are subtly 
> > different depending on which type you're dealing with.  Whether the input 
> > is lazy or not is the least of the concerns.  The bigger issue is when to 
> > pass keys to the $fn; probably always in Dict, probably never in Seq, and 
> > certainly never in Set (as there are no meaningful keys).  Similarly, when 
> > filtering a Dict, you would want keys preserved.  When filtering a Seq, 
> > you'd want the indexes re-zeroed.  (Or to seem like it, given or take 
> > implementation details.)  And then, yes, there's the laziness question.
> >
> > So we'd effectively want three different versions of map(), filter(), etc. 
> > if we didn't want to perpetuate and further entrench the design flaw and 
> > security hole that is "sequences and hashes are the same thing if you 
> > squint."  And... frankly I'd probably vote against an interable/collections 
> > API that didn't address that issue.
> 
> I fundamentally disagree with this assessment. In most languages,
> including PHP, iterators are simply a sequence of values that can be
> consumed. Usually, the consumer should not be concerned with the data
> structure of the iterated value, this is abstracted away through the
> iterator. For most languages, both Sequences and Sets are translated
> 1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
> Dictionaries usually result in a tuple, combining both the key and
> value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
> is a bit different in that all iterators require a key. Semantically,
> this makes sense for both Sequences (which are logically indexed by
> the elements position in the sequence, so Sequence<T> => Iterator<int,
> T>) and Dicts (which have an explicit key, so Dict<T, U> =>
> Iterator<T, U>). Sets don't technically have a logical key, but IMO
> this is not enough of a reason to fundamentally change how iterators
> work. A sequential number would be fine, which is also what yield
> without providing a key does. If we really wanted to avoid it, we can
> make it return null, as this is already allowed for generators.
> https://3v4l.org/LvIjP
> 
> The big upside of treating all iterators the same, regardless of their
> data source is 1. the code becomes more generic, you don't need three
> variants of a value map() functions when the one works on all of them.
> And 2. you can populate any of the data structures from a generic
> iterator without any data shuffling.
> 
> $users
>     |> Iter\mapKeys(fn($u) => $u->getId())
>     |> Iter\toDict();
> 
> This will work if $users is a Sequence, Set or existing Dict with some
> other key. Actually, it works for any Traversable. If mapKeys() only
> applied to Dict iterators you would necessarily have to create a
> temporary dictionary first, or just not use the iterator API at all.
> 
> > However, a simple "first arg" pipe wouldn't allow for that.  Or rather, 
> > we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable 
> > $it, callable $fn), and dictMap(iterable $it, callable $fn).  And the same 
> > split for filter, and probably a few other things.  That seems 
> > ergonomically suspect, at best, and still wouldn't really address the issue 
> > since you would have no way to ensure you're using the "right" version of 
> > each function. Similarly, a dict version of implode() would likely need to 
> > take 2 separators, whereas the other types would take only one.
> >
> > So the more I think on it, the more I think the sort of iterable API that 
> > first-arg pipes would make easy is... probably not the iterable API we want 
> > anyway.  There may well be other cases for Elixir-style first-arg pipes, 
> > but a new iterable API isn't one of them, at least not in this form.
> 
> After having talked to you directly, it seemed to me that there is
> some confusion about the iterator API vs. the API offered by the data
> structure itself. For example:
> 
> > $l = new List(1,2, 3);
> > $l2 = $l |> map(fn($x) => $x*2);
> >
> > What is the type of $l2? I would expect it to be a List, but there's 
> > currently
> > no way to write a map() that statically guarantees that. (And that's before 
> > we
> > get into generics.)
> 
> $l2 wouldn't be a List (or Sequence, to stick with the same
> terminology) but an iterator, specifically Iterator<int, int>. If you
> want to get back a sequence, you need to populate a new sequence from
> the iterator using Iter\toSeq(). We may also decide to introduce a
> Sequence::map() method that maps directly to a new sequence, which may
> be more efficient for single transformations. That said, the nice
> thing about the iterator API is that it generically applies to all
> data structures implementing Traversable. For example, an Iter\max()
> function would not need to care about the implementation details of
> the underlying data structure, nor do all data structures need to
> reimplement their own versions of max().
> 
> > Which brings us then to extension functions.
> 
> I have largely changed my mind on extension functions. Extension
> functions that are exclusively local, static and detached from the
> type system are rather useless. Looking at an example:
> 
> > function PointEntity.toMessage(): PointMessage {
> >     return new PointMessage($this->x, $this->y);
> > }
> >
> > $result = json_encode($point->toMessage());
> 
> If for some reason toMessage() cannot be implemented on PointEntity,
> there's arguably no benefit of $point->toMessage() over `$point |>
> PointEntityExtension\toMessage()` (with an optional import to make it
> almost as short). All the extension really achieves is changing the
> syntax, but we would already have the pipe operator for this.
> Technically, you can use such extensions for untyped, local
> polymorphism, but this does not seem like a good approach.
> 
> function PointEntity.toMessage(): PointMessage { ... }
> function RectEntity.toMessage(): RectMessage { ... }
> 
> $entities = [new Point, new Rect];
> 
> foreach ($entities as $e) {
>     $e->toMessage(); // Technically works, but the type system is
> entirely unaware.
>     takesToMessage($e); // This breaks, because Point and Rect don't
> actually implement the ToMessage interface.
> }
> 
> Where extensions would really shine is if they could hook into the
> type system by implementing interfaces on types that aren't in your
> control. Rust and Swift are two examples that take this approach.
> 
> implement ToMessage for Rect { ... }
> 
> takesToMessage(new Rect); // Now this actually works.
> 
> However, this becomes even harder to implement than extension
> functions already would. I won't go into detail because this e-mail is
> already too long, but I'm happy to discuss it further off-list. All
> this to say, I don't think extensions will work well in PHP, but I
> also don't think they are necessary for the iterator API.
> 
> Regards,
> Ilija
>


Hi Ilija and Larry,

This got me thinking: what if instead of "magically" passing a first value to a 
function, or partial applications, we create a new interface; something like:

interface PipeCompatible {
  function receiveContext(mixed $lastValue): void;
}

If the implementing type implements this interface, it will receive the last 
value via the interface before being called

This would then force userland to implement a bunch of functionality to take 
true advantage of the pipe operator, but at the same time, allow for extensions 
(or core / SPL) to also take full advantage of them. 

I have no idea if such a thing works in practice, so I'm just spit balling here.

— Rob

Re: [PHP-DEV] [RFC] Pipe Operator (again)

Reply via email to