Most comments inline, but just some more background. We have many existing operators on Iterators that are "lazy". By lazy, I just mean return an iterator. Lazy in this context is a relative term. While all the methods return an iterator, they have varying degrees of laziness (repeat has storage implications, sort and reverse read in the whole source before emitting elements lazily). Here is the list:
chop, collate, drop, dropRight, dropWhile, flatten, flattenMany, indexed, init, injectAll, interleave, plus, repeat, reverse, sort, tail, take, takeWhile, toSorted, toUnique, unique, withIndex, zip, zipAll There are also a bunch of operators that are eager and are terminal-like in nature, so there is no benefit in a lazy version: any, average, every, find/findResult/findIndexOf (singular), count, countBy, inject, etc. There are 3 common and 3 less common operators that are currently eager that would benefit from having a lazy (read Iterator) return type: common: collect, collectMany, findAll less common: collectEntries, findIndexValues, findResults One option is to break compatibility and just change the return type to an iterator (plus provide a bridge method for pre-compiled @CS code). I would be less concerned about doing this for the three less common methods. We'd release not it of course. Actually, collectMany usage is probably rare too. For collect and findAll, I suspect their usage with Iterators is also rare but much less so than the others. If we don't want to break compatibility, we need to keep the eager one and provide a different name to indicate the "lazy/non-terminal" one. "Lazy" and "Then" suffixes were meant to do that. Other comments in line. On Wed, Apr 9, 2025 at 9:22 PM Jochen Theodorou <blackd...@gmx.org> wrote: > > On 09.04.25 03:25, Paul King wrote: > > Hi folks, > > > > [I sent this to grails dev list but meant to send it here and CC them > > for feedback - anyway, it is here now, apologies if you see this > > twice.] > > > > I have been looking at the functionality in Groovy-stream[1] and > > Gatherers4J[2] lately with a view to filling any gaps in Groovy's > > iterator DGM methods. I'm not trying to replicate everything they > > contain, just looking for the most useful functionality Groovy might > > be missing. > > > > The biggest missing pieces at this point in my mind are lazy (Iterator > > return value) variants of findAll, collect, and collectMany. Groovy's > > current variants are eager (return collections and lists). > > Groovy-stream gets around this by adding stream-named variants: > > filter, map, and flatMap. > > > does Iterator really automatically mean lazy? Is this possible then? > > def l = [1,2,3] > def iterator = l.iterator().findAll{it>1} > l[0] = 20 > assert iterator.toList() == [20,2,3] When I say lazy, we just mean a normal Java iterator. It is lazy in the sense that it won't return elements until you call next(). The findAllLazy method does prime itself by finding the first element so it knows what to return for hasNext(). That means we can change (if we don't mind potential chaos) the underlying collection backing the iterator. In our case, 2 is the first element, satisfying the predicate so we can change anything after that: def source = [1,2,3] def list = source.iterator().findAll{ it>1 } // eager source[0] = 20 assert list == [2,3] source = [1,2,3] def iterator = source.iterator().findAllLazy{ it>1 } source[0] = 20 // too late, primed past here already assert iterator.toList() == [2,3] source = [3,2,1] iterator = source.iterator().findAllLazy{ it>1 } source[2] = 20 assert iterator.toList() == [3,2,20] > The combination of lazy and mutable can be problematic. Also, can I do > iterator.toList() multiple times? These are just normal Java iterators, calling an exhausted iterator throws a no such element exception. > > One option is to break backwards compatibility (Groovy 5 only). So > > only for the versions of those methods which take an Iterator as > > input, change the return type to be Iterator. Given how widely used > > those methods are, I don't think that is an option for us. > > well... we could have lazyIt() which return a LazyIterator. that would > make things clear. Yes, we could do something like create a wrapped (Iterable/Array) and have an unchanged method name on the wrapper class. We don't do that for the other 24 existing Iterator methods listed at the start of this message. > > Actually, findAll currently doesn't have an Iterator variant, so we > > could add that but it would still be a behavioral compatibility > > breakage since the Object version is used for Iterators and it returns > > a list. > > > > So, we could give up on lazy variants for those methods, but again > > given how commonly used they are, that is a pretty big gap. > > > > So, the other option is to provide alternative names. The best to me seem: > > > > (A) findAllLazy, collectLazy, collectManyLazy > > (B) findAllThen, collectThen, collectManyThen > > (C) filter, map, flatMap > > (D) something else? > > (E) lazyIt() See above. > I am pretty sure it will not be considered as because it is a bit > ugly... but frankly I am no fan of A at all for the very same reason. B > I find more nice because it is more semantics based. > > hmm... what happens if you mix lazy and non-lazy... like for example > findAllthen(...).findAll(...) We don't do anything special here it will be normal Java semantics: Iterator + eager = eager. > > Option (C) is what Groovy-stream did and would be familiar to Java > > Stream users but folks are going to ask, why can't I have that "alias" > > for Iterables and arrays, but the intent here is just for the Iterator > > variants. I think Lazy best conveys that. Use without "Lazy" for the > > eager (think terminal operator) variant and with "Lazy" for the lazy > > (think intermediate operator) variant. It also is easier to extend, > > the fourth method in terms of gaps is collectEntries, which currently > > returns a Map. An Iterator<Map.Entry> return value could be made for > > collectEntriesLazy if we wanted. > > somehow does not really convince me all. > > > Note that many of our operators are terminal in nature, find, count*, > > inject, etc, so this isn't about doing this for all operators > > eventually. > > but findAllThen does not come over as terminal. If they are supposed to, > then I don't find the names fitting in the cases of A and B. findAllThen isn't meant to hint at being terminal, it was meant to hint at laziness since with lazy methods you'd typically chain the next method. > bye jochen >