Hi Larry Sorry again for the delay.
On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <la...@garfieldtech.com> wrote: > > * A new iterable API is absolutely a good thing and we should do it. > * That said, we *need* to split Sequence, Set, and Dictionary into separate > types. We are the only language I reviewed that didn't have them as separate > constructs with their own APIs. > * The use of the same construct (arrays and iterables) for all three types is > a fundamental and core flaw in PHP's design that we should not double-down > on. It's ergonomically awful, it's bad for performance, and it invites major > security holes. (The "Drupageddon" remote exploit was caused by using an > array and assuming it was sequential when it was actually a map.) > > So while I want a new iterable API, the more I think on it, the more I think > a bunch of map(iterable $it, callable $fn) style functions would not be the > right way to do it. That would be easy, but also ineffective. > > The behavior of even basic operations like map and filter are subtly > different depending on which type you're dealing with. Whether the input is > lazy or not is the least of the concerns. The bigger issue is when to pass > keys to the $fn; probably always in Dict, probably never in Seq, and > certainly never in Set (as there are no meaningful keys). Similarly, when > filtering a Dict, you would want keys preserved. When filtering a Seq, you'd > want the indexes re-zeroed. (Or to seem like it, given or take > implementation details.) And then, yes, there's the laziness question. > > So we'd effectively want three different versions of map(), filter(), etc. if > we didn't want to perpetuate and further entrench the design flaw and > security hole that is "sequences and hashes are the same thing if you > squint." And... frankly I'd probably vote against an interable/collections > API that didn't address that issue. I fundamentally disagree with this assessment. In most languages, including PHP, iterators are simply a sequence of values that can be consumed. Usually, the consumer should not be concerned with the data structure of the iterated value, this is abstracted away through the iterator. For most languages, both Sequences and Sets are translated 1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>). Dictionaries usually result in a tuple, combining both the key and value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP is a bit different in that all iterators require a key. Semantically, this makes sense for both Sequences (which are logically indexed by the elements position in the sequence, so Sequence<T> => Iterator<int, T>) and Dicts (which have an explicit key, so Dict<T, U> => Iterator<T, U>). Sets don't technically have a logical key, but IMO this is not enough of a reason to fundamentally change how iterators work. A sequential number would be fine, which is also what yield without providing a key does. If we really wanted to avoid it, we can make it return null, as this is already allowed for generators. https://3v4l.org/LvIjP The big upside of treating all iterators the same, regardless of their data source is 1. the code becomes more generic, you don't need three variants of a value map() functions when the one works on all of them. And 2. you can populate any of the data structures from a generic iterator without any data shuffling. $users |> Iter\mapKeys(fn($u) => $u->getId()) |> Iter\toDict(); This will work if $users is a Sequence, Set or existing Dict with some other key. Actually, it works for any Traversable. If mapKeys() only applied to Dict iterators you would necessarily have to create a temporary dictionary first, or just not use the iterator API at all. > However, a simple "first arg" pipe wouldn't allow for that. Or rather, we'd > need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, > callable $fn), and dictMap(iterable $it, callable $fn). And the same split > for filter, and probably a few other things. That seems ergonomically > suspect, at best, and still wouldn't really address the issue since you would > have no way to ensure you're using the "right" version of each function. > Similarly, a dict version of implode() would likely need to take 2 > separators, whereas the other types would take only one. > > So the more I think on it, the more I think the sort of iterable API that > first-arg pipes would make easy is... probably not the iterable API we want > anyway. There may well be other cases for Elixir-style first-arg pipes, but > a new iterable API isn't one of them, at least not in this form. After having talked to you directly, it seemed to me that there is some confusion about the iterator API vs. the API offered by the data structure itself. For example: > $l = new List(1,2, 3); > $l2 = $l |> map(fn($x) => $x*2); > > What is the type of $l2? I would expect it to be a List, but there's currently > no way to write a map() that statically guarantees that. (And that's before we > get into generics.) $l2 wouldn't be a List (or Sequence, to stick with the same terminology) but an iterator, specifically Iterator<int, int>. If you want to get back a sequence, you need to populate a new sequence from the iterator using Iter\toSeq(). We may also decide to introduce a Sequence::map() method that maps directly to a new sequence, which may be more efficient for single transformations. That said, the nice thing about the iterator API is that it generically applies to all data structures implementing Traversable. For example, an Iter\max() function would not need to care about the implementation details of the underlying data structure, nor do all data structures need to reimplement their own versions of max(). > Which brings us then to extension functions. I have largely changed my mind on extension functions. Extension functions that are exclusively local, static and detached from the type system are rather useless. Looking at an example: > function PointEntity.toMessage(): PointMessage { > return new PointMessage($this->x, $this->y); > } > > $result = json_encode($point->toMessage()); If for some reason toMessage() cannot be implemented on PointEntity, there's arguably no benefit of $point->toMessage() over `$point |> PointEntityExtension\toMessage()` (with an optional import to make it almost as short). All the extension really achieves is changing the syntax, but we would already have the pipe operator for this. Technically, you can use such extensions for untyped, local polymorphism, but this does not seem like a good approach. function PointEntity.toMessage(): PointMessage { ... } function RectEntity.toMessage(): RectMessage { ... } $entities = [new Point, new Rect]; foreach ($entities as $e) { $e->toMessage(); // Technically works, but the type system is entirely unaware. takesToMessage($e); // This breaks, because Point and Rect don't actually implement the ToMessage interface. } Where extensions would really shine is if they could hook into the type system by implementing interfaces on types that aren't in your control. Rust and Swift are two examples that take this approach. implement ToMessage for Rect { ... } takesToMessage(new Rect); // Now this actually works. However, this becomes even harder to implement than extension functions already would. I won't go into detail because this e-mail is already too long, but I'm happy to discuss it further off-list. All this to say, I don't think extensions will work well in PHP, but I also don't think they are necessary for the iterator API. Regards, Ilija