Most comments inline, but just some more background. We have many
existing operators on Iterators that are "lazy". By lazy, I just mean
return an iterator. Lazy in this context is a relative term. While all
the methods return an iterator, they have varying degrees of laziness
(repeat has storage implications, sort and reverse read in the whole
source before emitting elements lazily). Here is the list:

chop, collate, drop, dropRight, dropWhile, flatten, flattenMany,
indexed, init, injectAll, interleave, plus, repeat, reverse, sort,
tail, take, takeWhile, toSorted, toUnique, unique, withIndex, zip,
zipAll

There are also a bunch of operators that are eager and are
terminal-like in nature, so there is no benefit in a lazy version:

any, average, every, find/findResult/findIndexOf (singular), count,
countBy, inject, etc.

There are 3 common and 3 less common operators that are currently
eager that would benefit from having a lazy (read Iterator) return
type:

common: collect, collectMany, findAll
less common: collectEntries, findIndexValues, findResults

One option is to break compatibility and just change the return type
to an iterator (plus provide a bridge method for pre-compiled @CS
code). I would be less concerned about doing this for the three less
common methods. We'd release not it of course. Actually, collectMany
usage is probably rare too. For collect and findAll, I suspect their
usage with Iterators is also rare but much less so than the others.

If we don't want to break compatibility, we need to keep the eager one
and provide a different name to indicate the "lazy/non-terminal" one.
"Lazy" and "Then" suffixes were meant to do that.

Other comments in line.

On Wed, Apr 9, 2025 at 9:22 PM Jochen Theodorou <blackd...@gmx.org> wrote:
>
> On 09.04.25 03:25, Paul King wrote:
> > Hi folks,
> >
> > [I sent this to grails dev list but meant to send it here and CC them
> > for feedback - anyway, it is here now, apologies if you see this
> > twice.]
> >
> > I have been looking at the functionality in Groovy-stream[1] and
> > Gatherers4J[2] lately with a view to filling any gaps in Groovy's
> > iterator DGM methods. I'm not trying to replicate everything they
> > contain, just looking for the most useful functionality Groovy might
> > be missing.
> >
> > The biggest missing pieces at this point in my mind are lazy (Iterator
> > return value) variants of findAll, collect, and collectMany. Groovy's
> > current variants are eager (return collections and lists).
> > Groovy-stream gets around this by adding stream-named variants:
> > filter, map, and flatMap.
>
>
> does Iterator really automatically mean lazy? Is this possible then?
>
> def l = [1,2,3]
> def iterator = l.iterator().findAll{it>1}
> l[0] = 20
> assert iterator.toList() == [20,2,3]

When I say lazy, we just mean a normal Java iterator. It is lazy in
the sense that it won't return elements until you call next(). The
findAllLazy method does prime itself by finding the first element so
it knows what to return for hasNext(). That means we can change (if we
don't mind potential chaos) the underlying collection backing the
iterator. In our case, 2 is the first element, satisfying the
predicate so we can change anything after that:

def source = [1,2,3]
def list = source.iterator().findAll{ it>1 } // eager
source[0] = 20
assert list == [2,3]

source = [1,2,3]
def iterator = source.iterator().findAllLazy{ it>1 }
source[0] = 20 // too late, primed past here already
assert iterator.toList() == [2,3]

source = [3,2,1]
iterator = source.iterator().findAllLazy{ it>1 }
source[2] = 20
assert iterator.toList() == [3,2,20]


> The combination of lazy and mutable can be problematic. Also, can I do
> iterator.toList() multiple times?

These are just normal Java iterators, calling an exhausted iterator
throws a no such element exception.

> > One option is to break backwards compatibility (Groovy 5 only). So
> > only for the versions of those methods which take an Iterator as
> > input, change the return type to be Iterator. Given how widely used
> > those methods are, I don't think that is an option for us.
>
> well... we could have lazyIt() which return a LazyIterator. that would
> make things clear.

Yes, we could do something like create a wrapped (Iterable/Array) and
have an unchanged method name on the wrapper class. We don't do that
for the other 24 existing Iterator methods listed at the start of this
message.

> > Actually, findAll currently doesn't have an Iterator variant, so we
> > could add that but it would still be a behavioral compatibility
> > breakage since the Object version is used for Iterators and it returns
> > a list.
> >
> > So, we could give up on lazy variants for those methods, but again
> > given how commonly used they are, that is a pretty big gap.
> >
> > So, the other option is to provide alternative names. The best to me seem:
> >
> > (A) findAllLazy, collectLazy, collectManyLazy
> > (B) findAllThen, collectThen, collectManyThen
> > (C) filter, map, flatMap
> > (D) something else?
>
> (E) lazyIt()

See above.

> I am pretty sure it will not be considered as because it is a bit
> ugly... but frankly I am no fan of A at all for the very same reason. B
> I find more nice because it is more semantics based.
>
> hmm... what happens if you mix lazy and non-lazy... like for example
> findAllthen(...).findAll(...)

We don't do anything special here it will be normal Java semantics:

Iterator + eager = eager.

> > Option (C) is what Groovy-stream did and would be familiar to Java
> > Stream users but folks are going to ask, why can't I have that "alias"
> > for Iterables and arrays, but the intent here is just for the Iterator
> > variants. I think Lazy best conveys that. Use without "Lazy" for the
> > eager (think terminal operator) variant and with "Lazy" for the lazy
> > (think intermediate operator) variant. It also is easier to extend,
> > the fourth method in terms of gaps is collectEntries, which currently
> > returns a Map. An Iterator<Map.Entry> return value could be made for
> > collectEntriesLazy if we wanted.
>
> somehow does not really convince me all.
>
> > Note that many of our operators are terminal in nature, find, count*,
> > inject, etc, so this isn't about doing this for all operators
> > eventually.
>
> but findAllThen does not come over as terminal. If they are supposed to,
> then I don't find the names fitting in the cases of A and B.

findAllThen isn't  meant to hint at being terminal, it was meant to
hint at laziness since with lazy methods you'd typically chain the
next method.

> bye jochen
>

Reply via email to