Hi Levi Morrison,

> > > Hi internals,
> > >
> > > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding 
> > > > CachedIterable,
> > > > which eagerly evaluates any iterable and contains an immutable copy of 
> > > > the keys and values of the iterable it was constructed from
> > > >
> > > > This has the proposed signature:
> > > >
> > > > ```
> > > > final class CachedIterable implements IteratorAggregate, Countable, 
> > > > JsonSerializable
> > > > {
> > > >     public function __construct(iterable $iterator) {}
> > > >     public function getIterator(): InternalIterator {}
> > > >     public function count(): int {}
> > > >     // [[$key1, $value1], [$key2, $value2]]
> > > >     public static function fromPairs(array $pairs): CachedIterable {}
> > > >     // [[$key1, $value1], [$key2, $value2]]
> > > >     public function toPairs(): array{}
> > > >     public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
> > > >     public function __unserialize(array $data): void {}
> > > >
> > > >     // useful for converting iterables back to arrays for further 
> > > >processing
> > > >     public function keys(): array {}  // [$k1, $k2, ...]
> > > >     public function values(): array {}  // [$v1, $v2, ...]
> > > >     // useful to efficiently get offsets at the middle/end of a long 
> > > >iterable
> > > >     public function keyAt(int $offset): mixed {}
> > > >     public function valueAt(int $offset): mixed {}
> > > >
> > > >     // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
> > > >     public function jsonSerialize(): array {}
> > > >     // dynamic properties are forbidden
> > > > }
> > > > ```
> > > >
> > > > Currently, PHP does not provide a built-in way to store the state of an 
> > > > arbitrary iterable for reuse later
> > > > (when the iterable has arbitrary keys, or when keys might be repeated). 
> > > > It would be useful to do so for many use cases, such as:
> > > >
> > > > 1. Creating a rewindable copy of a non-rewindable Traversable
> > > > 2. Generating an IteratorAggregate from a class still implementing 
> > > > Iterator
> > > > 3. In the future, providing internal or userland helpers such as 
> > > > iterable_flip(iterable $input), iterable_take(iterable $input, int 
> > > > $limit),
> > > >     iterable_chunk(iterable $input, int $chunk_size), 
> > > >iterable_reverse(), etc (these are not part of the RFC)
> > > > 4. Providing memory-efficient random access to both keys and values of 
> > > > arbitrary key-value sequences
> > > >
> > > > Having this implemented as an internal class would also allow it to be 
> > > > much more efficient than a userland solution
> > > > (in terms of time to create, time to iterate over the result, and total 
> > > > memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks
> > > >
> > > > After some consideration, this is being created as a standalone RFC, 
> > > > and going in the global namespace:
> > > >
> > > > - Based on early feedback on 
> > > > https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the 
> > > > namespace preferred in previous polls)
> > > >   It seems like it's way too early for me to be proposing namespaces in 
> > > >any RFCs for PHP adding to modules that already exist, when there is no 
> > > >consensus.
> > > >
> > > >   An earlier attempt by others on creating a policy for namespaces in 
> > > >general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not 
> > > >pass.
> > > >
> > > >   Having even 40% of voters opposed to introducing a given namespace 
> > > >(in pre-existing modules)
> > > >   makes it an impractical choice when RFCs require a 2/3 majority to 
> > > >pass.
> > > > - While some may argue that a different namespace might pass,
> > > >   
> > > >https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote 
> > > >had a sharp dropoff in feedback after the 3rd form.
> > > >   I don't know how to interpret that - e.g. are unranked namespaces 
> > > >preferred even less than the options that were ranked or just not seen 
> > > >as affecting the final result.
> > >
> > > A heads up - I will probably start voting on 
> > > https://wiki.php.net/rfc/cachediterable this weekend after 
> > > https://wiki.php.net/rfc/cachediterable_straw_poll is finished.
> > >
> > > Any other feedback on CachedIterable?
> > >
> > > Thanks,
> > > Tyson
> > >
> > > --
> > > PHP Internals - PHP Runtime Development Mailing List
> > > To unsubscribe, visit: https://www.php.net/unsub.php
> > >
> >
> > Based on a recent comment you made on GitHub, it seems like
> > `CachedIterable` eagerly creates the datastore instead of doing so
> > on-demand. Is this correct?
> 
> Sorry, yes, that's correct and pointed out in the RFC.
> 
> I think that's a significant implementation flaw. I don't see why we'd
> balloon memory usage unnecessarily by being eager -- if an operation
> needs to fetch more data then it can go ahead and do so.

First, PHP's standard library accommodates a wide variety of use cases, of 
which I believe eager evaluation is the most common.
There is no reason that an eagerly evaluated CachedIterable and lazily 
evaluated LazyCachedIterable couldn't be both added at some point
if both had passing RFCs.

(This is referring to https://en.wikipedia.org/wiki/Lazy_evaluation and 
https://en.wikipedia.org/wiki/Eager_evaluation)

As was stated in that GitHub Discussion,

1) If a CachedIterable were to be used in the standard library or a 
user-defined library,
   many end users would want the standard library to return something that 
could be iterated over multiple times.
   The limit of a single iteration was a source of bugs in SPL classes 
   such as https://www.php.net/arrayobject prior to them being switched to 
IteratorAggregate.

   (This is concerning whether functions such as `*filter` and `*map` should 
evaluate the result eagerly or lazily if they do get added.
   It is possible for a LazyCachedIterable to be implemented that computes 
values on demand, but see below points.)

```
$foo = map(...);
foreach ($foo as $i => $v1) {
    foreach ($foo as $i => $v2) {
        if (some_pair_predicate($v1, $v2)) {
            // do something
        }
    }
}
```

2) Userland library/application authors that are interested in lazy generators 
could use or implement something 
   such as https://github.com/nikic/iter instead. My opinion is that the 
standard library should provide 
   something that is easy to understand, debug, serialize or represent, etc.
   I expect the inner iterable may be hidden entirely in a LazyCachedIterable 
from var_dump as an implementation detail.

3) It would be harder to understand why SomeFrameworkException is thrown in 
code unrelated to that framework 
   when a lazy (instead of eager) iterable is passed to some function that 
accepts a generic iterable,
   and harder to write correct exception handling for it if done in a lazy 
generation style.

   Many RFCs have been rejected due to being perceived as being likely to be 
misused in userland or 
   to make code harder to understand.

4) It is possible to implement a lazy alternative to CachedIterable that only 
loads values as needed.
   However, I hadn't proposed it due to doubts that 2/3 of voters would 
consider it widely useful 
   enough to be included in php rather than as a userland or PECL library.

Additionally,

CachedIterables are much more memory efficient than existing options such as 
arrays
https://wiki.php.net/rfc/cachediterable#cachediterables_are_memory-efficient
(The only thing more efficient in PHP's core modules is SplFixedArray,
and that only allows keys `0..n-1`)

Regards,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to