On Wed, Apr 9, 2025, at 01:29, Ilija Tovilo wrote: > Hi Larry > > Sorry again for the delay. > > On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <la...@garfieldtech.com> wrote: > > > > * A new iterable API is absolutely a good thing and we should do it. > > * That said, we *need* to split Sequence, Set, and Dictionary into separate > > types. We are the only language I reviewed that didn't have them as > > separate constructs with their own APIs. > > * The use of the same construct (arrays and iterables) for all three types > > is a fundamental and core flaw in PHP's design that we should not > > double-down on. It's ergonomically awful, it's bad for performance, and it > > invites major security holes. (The "Drupageddon" remote exploit was caused > > by using an array and assuming it was sequential when it was actually a > > map.) > > > > So while I want a new iterable API, the more I think on it, the more I > > think a bunch of map(iterable $it, callable $fn) style functions would not > > be the right way to do it. That would be easy, but also ineffective. > > > > The behavior of even basic operations like map and filter are subtly > > different depending on which type you're dealing with. Whether the input > > is lazy or not is the least of the concerns. The bigger issue is when to > > pass keys to the $fn; probably always in Dict, probably never in Seq, and > > certainly never in Set (as there are no meaningful keys). Similarly, when > > filtering a Dict, you would want keys preserved. When filtering a Seq, > > you'd want the indexes re-zeroed. (Or to seem like it, given or take > > implementation details.) And then, yes, there's the laziness question. > > > > So we'd effectively want three different versions of map(), filter(), etc. > > if we didn't want to perpetuate and further entrench the design flaw and > > security hole that is "sequences and hashes are the same thing if you > > squint." And... frankly I'd probably vote against an interable/collections > > API that didn't address that issue. > > I fundamentally disagree with this assessment. In most languages, > including PHP, iterators are simply a sequence of values that can be > consumed. Usually, the consumer should not be concerned with the data > structure of the iterated value, this is abstracted away through the > iterator. For most languages, both Sequences and Sets are translated > 1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>). > Dictionaries usually result in a tuple, combining both the key and > value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP > is a bit different in that all iterators require a key. Semantically, > this makes sense for both Sequences (which are logically indexed by > the elements position in the sequence, so Sequence<T> => Iterator<int, > T>) and Dicts (which have an explicit key, so Dict<T, U> => > Iterator<T, U>). Sets don't technically have a logical key, but IMO > this is not enough of a reason to fundamentally change how iterators > work. A sequential number would be fine, which is also what yield > without providing a key does. If we really wanted to avoid it, we can > make it return null, as this is already allowed for generators. > https://3v4l.org/LvIjP > > The big upside of treating all iterators the same, regardless of their > data source is 1. the code becomes more generic, you don't need three > variants of a value map() functions when the one works on all of them. > And 2. you can populate any of the data structures from a generic > iterator without any data shuffling. > > $users > |> Iter\mapKeys(fn($u) => $u->getId()) > |> Iter\toDict(); > > This will work if $users is a Sequence, Set or existing Dict with some > other key. Actually, it works for any Traversable. If mapKeys() only > applied to Dict iterators you would necessarily have to create a > temporary dictionary first, or just not use the iterator API at all. > > > However, a simple "first arg" pipe wouldn't allow for that. Or rather, > > we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable > > $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same > > split for filter, and probably a few other things. That seems > > ergonomically suspect, at best, and still wouldn't really address the issue > > since you would have no way to ensure you're using the "right" version of > > each function. Similarly, a dict version of implode() would likely need to > > take 2 separators, whereas the other types would take only one. > > > > So the more I think on it, the more I think the sort of iterable API that > > first-arg pipes would make easy is... probably not the iterable API we want > > anyway. There may well be other cases for Elixir-style first-arg pipes, > > but a new iterable API isn't one of them, at least not in this form. > > After having talked to you directly, it seemed to me that there is > some confusion about the iterator API vs. the API offered by the data > structure itself. For example: > > > $l = new List(1,2, 3); > > $l2 = $l |> map(fn($x) => $x*2); > > > > What is the type of $l2? I would expect it to be a List, but there's > > currently > > no way to write a map() that statically guarantees that. (And that's before > > we > > get into generics.) > > $l2 wouldn't be a List (or Sequence, to stick with the same > terminology) but an iterator, specifically Iterator<int, int>. If you > want to get back a sequence, you need to populate a new sequence from > the iterator using Iter\toSeq(). We may also decide to introduce a > Sequence::map() method that maps directly to a new sequence, which may > be more efficient for single transformations. That said, the nice > thing about the iterator API is that it generically applies to all > data structures implementing Traversable. For example, an Iter\max() > function would not need to care about the implementation details of > the underlying data structure, nor do all data structures need to > reimplement their own versions of max(). > > > Which brings us then to extension functions. > > I have largely changed my mind on extension functions. Extension > functions that are exclusively local, static and detached from the > type system are rather useless. Looking at an example: > > > function PointEntity.toMessage(): PointMessage { > > return new PointMessage($this->x, $this->y); > > } > > > > $result = json_encode($point->toMessage()); > > If for some reason toMessage() cannot be implemented on PointEntity, > there's arguably no benefit of $point->toMessage() over `$point |> > PointEntityExtension\toMessage()` (with an optional import to make it > almost as short). All the extension really achieves is changing the > syntax, but we would already have the pipe operator for this. > Technically, you can use such extensions for untyped, local > polymorphism, but this does not seem like a good approach. > > function PointEntity.toMessage(): PointMessage { ... } > function RectEntity.toMessage(): RectMessage { ... } > > $entities = [new Point, new Rect]; > > foreach ($entities as $e) { > $e->toMessage(); // Technically works, but the type system is > entirely unaware. > takesToMessage($e); // This breaks, because Point and Rect don't > actually implement the ToMessage interface. > } > > Where extensions would really shine is if they could hook into the > type system by implementing interfaces on types that aren't in your > control. Rust and Swift are two examples that take this approach. > > implement ToMessage for Rect { ... } > > takesToMessage(new Rect); // Now this actually works. > > However, this becomes even harder to implement than extension > functions already would. I won't go into detail because this e-mail is > already too long, but I'm happy to discuss it further off-list. All > this to say, I don't think extensions will work well in PHP, but I > also don't think they are necessary for the iterator API. > > Regards, > Ilija >
Hi Ilija and Larry, This got me thinking: what if instead of "magically" passing a first value to a function, or partial applications, we create a new interface; something like: interface PipeCompatible { function receiveContext(mixed $lastValue): void; } If the implementing type implements this interface, it will receive the last value via the interface before being called This would then force userland to implement a bunch of functionality to take true advantage of the pipe operator, but at the same time, allow for extensions (or core / SPL) to also take full advantage of them. I have no idea if such a thing works in practice, so I'm just spit balling here. — Rob