Re: [PHP-DEV] [RFC] Pipe Operator (again)

Larry Garfield Thu, 10 Apr 2025 09:02:14 -0700

On Wed, Apr 9, 2025, at 12:56 AM, Rob Landers wrote:
> On Wed, Apr 9, 2025, at 01:29, Ilija Tovilo wrote:
>> Hi Larry
>> 
>> Sorry again for the delay.
>> 
>> On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <la...@garfieldtech.com> wrote:
>> >
>> > * A new iterable API is absolutely a good thing and we should do it.
>> > * That said, we *need* to split Sequence, Set, and Dictionary into 
>> > separate types.  We are the only language I reviewed that didn't have them 
>> > as separate constructs with their own APIs.
>> > * The use of the same construct (arrays and iterables) for all three types 
>> > is a fundamental and core flaw in PHP's design that we should not 
>> > double-down on.  It's ergonomically awful, it's bad for performance, and 
>> > it invites major security holes.  (The "Drupageddon" remote exploit was 
>> > caused by using an array and assuming it was sequential when it was 
>> > actually a map.)
>> >
>> > So while I want a new iterable API, the more I think on it, the more I 
>> > think a bunch of map(iterable $it, callable $fn) style functions would not 
>> > be the right way to do it.  That would be easy, but also ineffective.
>> >
>> > The behavior of even basic operations like map and filter are subtly 
>> > different depending on which type you're dealing with.  Whether the input 
>> > is lazy or not is the least of the concerns.  The bigger issue is when to 
>> > pass keys to the $fn; probably always in Dict, probably never in Seq, and 
>> > certainly never in Set (as there are no meaningful keys).  Similarly, when 
>> > filtering a Dict, you would want keys preserved.  When filtering a Seq, 
>> > you'd want the indexes re-zeroed.  (Or to seem like it, given or take 
>> > implementation details.)  And then, yes, there's the laziness question.
>> >
>> > So we'd effectively want three different versions of map(), filter(), etc. 
>> > if we didn't want to perpetuate and further entrench the design flaw and 
>> > security hole that is "sequences and hashes are the same thing if you 
>> > squint."  And... frankly I'd probably vote against an 
>> > interable/collections API that didn't address that issue.
>> 
>> I fundamentally disagree with this assessment. In most languages,
>> including PHP, iterators are simply a sequence of values that can be
>> consumed. Usually, the consumer should not be concerned with the data
>> structure of the iterated value, this is abstracted away through the
>> iterator. For most languages, both Sequences and Sets are translated
>> 1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
>> Dictionaries usually result in a tuple, combining both the key and
>> value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
>> is a bit different in that all iterators require a key. Semantically,
>> this makes sense for both Sequences (which are logically indexed by
>> the elements position in the sequence, so Sequence<T> => Iterator<int,
>> T>) and Dicts (which have an explicit key, so Dict<T, U> =>
>> Iterator<T, U>). Sets don't technically have a logical key, but IMO
>> this is not enough of a reason to fundamentally change how iterators
>> work. A sequential number would be fine, which is also what yield
>> without providing a key does. If we really wanted to avoid it, we can
>> make it return null, as this is already allowed for generators.
>> https://3v4l.org/LvIjP
>> 
>> The big upside of treating all iterators the same, regardless of their
>> data source is 1. the code becomes more generic, you don't need three
>> variants of a value map() functions when the one works on all of them.
>> And 2. you can populate any of the data structures from a generic
>> iterator without any data shuffling.
>> 
>> $users
>>     |> Iter\mapKeys(fn($u) => $u->getId())
>>     |> Iter\toDict();
>> 
>> This will work if $users is a Sequence, Set or existing Dict with some
>> other key. Actually, it works for any Traversable. If mapKeys() only
>> applied to Dict iterators you would necessarily have to create a
>> temporary dictionary first, or just not use the iterator API at all.
>> 
>> > However, a simple "first arg" pipe wouldn't allow for that.  Or rather, 
>> > we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable 
>> > $it, callable $fn), and dictMap(iterable $it, callable $fn).  And the same 
>> > split for filter, and probably a few other things.  That seems 
>> > ergonomically suspect, at best, and still wouldn't really address the 
>> > issue since you would have no way to ensure you're using the "right" 
>> > version of each function. Similarly, a dict version of implode() would 
>> > likely need to take 2 separators, whereas the other types would take only 
>> > one.
>> >
>> > So the more I think on it, the more I think the sort of iterable API that 
>> > first-arg pipes would make easy is... probably not the iterable API we 
>> > want anyway.  There may well be other cases for Elixir-style first-arg 
>> > pipes, but a new iterable API isn't one of them, at least not in this form.
>> 
>> After having talked to you directly, it seemed to me that there is
>> some confusion about the iterator API vs. the API offered by the data
>> structure itself. For example:
>> 
>> > $l = new List(1,2, 3);
>> > $l2 = $l |> map(fn($x) => $x*2);
>> >
>> > What is the type of $l2? I would expect it to be a List, but there's 
>> > currently
>> > no way to write a map() that statically guarantees that. (And that's 
>> > before we
>> > get into generics.)
>> 
>> $l2 wouldn't be a List (or Sequence, to stick with the same
>> terminology) but an iterator, specifically Iterator<int, int>. If you
>> want to get back a sequence, you need to populate a new sequence from
>> the iterator using Iter\toSeq(). We may also decide to introduce a
>> Sequence::map() method that maps directly to a new sequence, which may
>> be more efficient for single transformations. That said, the nice
>> thing about the iterator API is that it generically applies to all
>> data structures implementing Traversable. For example, an Iter\max()
>> function would not need to care about the implementation details of
>> the underlying data structure, nor do all data structures need to
>> reimplement their own versions of max().


I agree that max() likely would not need multiple versions.  My concern is with 
cases where the signature of the callback changes depending on the type it's 
on, which is mainly map, filter, and maybe reduce.  Possibly sorted as well, if 
you want to allow sorting by keys.

If I'm following you correctly, you're saying that because PHP is already weird 
(in that abstract iterators are always keyed), it's not increasing the weird 
for dedicated collection objects to have implicit keys when used with an 
abstract iterator API.  Yes?

I think that's valid, but I also know just how many times I've been bitten by 
arrays doing double-duty.  Keys getting lost during a transformation when they 
shouldn't, etc.  I am highly skeptical about perpetuating that, and if we're 
going to revisit collections and iterators I would want to get the kind of 
guarantees that PHP has never given us, but most languages have always had.

That means, eg, seq/set/dict values/objects would pretty much have to have 
their own versions of map, filter, etc.  So that means we'd have 4 versions of 
map: seq::map, set::map, dict::map, and iter\map().  When would you use the 
latter over the former?

In any case, I fear this question is moot.  Basically no one but you and I 
seems to like the implicit-first-arg approach, so whether it's viable or not 
sadly doesn't matter.

Unless any voters want to speak up now to correct that impression?

>> > Which brings us then to extension functions.
>> 
>> I have largely changed my mind on extension functions. Extension
>> functions that are exclusively local, static and detached from the
>> type system are rather useless. Looking at an example:
>> 
>> > function PointEntity.toMessage(): PointMessage {
>> >     return new PointMessage($this->x, $this->y);
>> > }
>> >
>> > $result = json_encode($point->toMessage());
>> 
>> If for some reason toMessage() cannot be implemented on PointEntity,
>> there's arguably no benefit of $point->toMessage() over `$point |>
>> PointEntityExtension\toMessage()` (with an optional import to make it
>> almost as short). All the extension really achieves is changing the
>> syntax, but we would already have the pipe operator for this.
>> Technically, you can use such extensions for untyped, local
>> polymorphism, but this does not seem like a good approach.
>> 
>> function PointEntity.toMessage(): PointMessage { ... }
>> function RectEntity.toMessage(): RectMessage { ... }
>> 
>> $entities = [new Point, new Rect];
>> 
>> foreach ($entities as $e) {
>>     $e->toMessage(); // Technically works, but the type system is
>> entirely unaware.
>>     takesToMessage($e); // This breaks, because Point and Rect don't
>> actually implement the ToMessage interface.
>> }

You wouldn't pass $e directly to takesToMessage().  You'd call 
takesMessage($e->toMessage()).  It's literally just a function that you're 
reversing the syntax order on.  It is not supposed to impact the type 
signature.  If it does, then it's Rust Traits, not extension functions.

>> Where extensions would really shine is if they could hook into the
>> type system by implementing interfaces on types that aren't in your
>> control. Rust and Swift are two examples that take this approach.
>> 
>> implement ToMessage for Rect { ... }
>> 
>> takesToMessage(new Rect); // Now this actually works.
>> 
>> However, this becomes even harder to implement than extension
>> functions already would. I won't go into detail because this e-mail is
>> already too long, but I'm happy to discuss it further off-list. All
>> this to say, I don't think extensions will work well in PHP, but I
>> also don't think they are necessary for the iterator API.
>> 
>> Regards,
>> Ilija

Every time I daydream about what my ideal object-type-definition syntax would 
be, I eventually end up at Rust. :-)  And then I get sad that as an interpreted 
language, PHP makes that basically impossible.

All of the above leads me back around to "well if we don't do first-arg, then 
we'll want a way to make higher order functions easier to implement."  Which I 
am all for, and have proposed RFCs for in the past, and they've all been 
rejected.  So, yeah.  Maybe once pipes get used people will realize the value. 
:-)

> Hi Ilija and Larry,
>
> This got me thinking: what if instead of "magically" passing a first 
> value to a function, or partial applications, we create a new 
> interface; something like:
>
> interface PipeCompatible {
>   function receiveContext(mixed $lastValue): void;
> }
>
> If the implementing type implements this interface, it will receive the 
> last value via the interface before being called
>
> This would then force userland to implement a bunch of functionality to 
> take true advantage of the pipe operator, but at the same time, allow 
> for extensions (or core / SPL) to also take full advantage of them. 
>
> I have no idea if such a thing works in practice, so I'm just spit balling 
> here.
>
> — Rob

This approach would only be viable on objects.  So you'd have to do 

$a |> new B('c') |> ... ;

to get it to work.  Most of what we would want to use here are functions or 
methods, not manually created objects.  This would also be slower, as it 
involves two function calls instead of one.

Besides, that can already be achieved with __invoke().  

class B {
  public function __construct(private $arg1) {}

  public function __invoke($passedValue): Whatever {
    // Do stuff with both $arg1 and $passedValue
  }
}

--Larry Garfield

Re: [PHP-DEV] [RFC] Pipe Operator (again)

Reply via email to