Hi!

I've been thinking a bit about it [the problem] and arrived somewhere along the 
lines of it not necessarily being worth adding a new type to represent an index 
and an element (or trying to repurpose something existing), as I think the 
ergonomics of providing a mapper BiFunction is better, somewhere along the 
lines of:

public static <T, R> Gatherer<T, ?, R> mapIndexed(BiFunction<Long, ? super T, ? 
extends R> mapper) {
   class Index { long at = 0; }
   return Gatherer.ofSequential(
       Index::new,
       (idx, e, d) -> d.push(mapper.apply(idx.at++, e))
   );
}

(Using longs for indexing seems sensible for something which could have 
unbounded length)

Which would mean that if you have your own Pair class, or want to represent it 
as a Map.Entry, it's pretty straight-forward to do:

stream.gather(mapIndexed(Pair::of))...

It is unfortunate that parallelization takes a hit in this use-case, but 
knowing what indicies a sub-segment of the Stream has depends on the known size 
of the stream—and I wouldn't be surprised that out-of-order processing of 
indices can be surprising to people, so perhaps an ofSequential(…) isn't all 
that bad.

With that being said, including new Gatherers in the stdlib is important to be 
done only after thorough evaluation of need.

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf of Olexandr 
Rotan <rotanolexandr...@gmail.com>
Sent: Thursday, 5 December 2024 18:20
To: Henrik Wall <xehpuk....@gmail.com>
Cc: core-libs-dev <core-libs-dev@openjdk.org>
Subject: Re: JEP 473: Proposal for new built-in gatherer `indexed`


Hi. There has been a proposal from me (that Chen mentioned), approximately half 
a year ago. At the time I have insisted on creating stream sub interface, and 
even got a working prototype for sequential streams, but there have been such a 
huge complexity blowup in parallisation that I have just decided to drop it. 
Gatherers can be used pretty easily for this task, but using ofSequential, 
sacrificing parallelism. So basically, parallelism is a pain point here (or 
performance). I am not saying that it is impossible to console enumeration and 
prallelisation, but it will require huge efforts and invasive changes in 
current *Pipeline implementations, or enormous amounts of code duplication

On Thu, Dec 5, 2024, 18:48 Henrik Wall 
<xehpuk....@gmail.com<mailto:xehpuk....@gmail.com>> wrote:
Hey folks,

Not having access to the index of an element of a stream is often a
reason to fall back to a traditional loop, at least for me. I'd love
to have `Gatherers.indexed()` that looks something like this:

public static <TR> Gatherer<TR, ?, Map.Entry<Integer, TR>> indexed() {
    return Gatherer.ofSequential(
            () -> new int[1],
            Gatherer.Integrator.ofGreedy((state, element, downstream) ->
                    downstream.push(Map.entry(state[0]++, element)))
    );
}

(Potentially with a custom pair class to avoid auto-boxing.)

In other popular languages like Python or Rust, this is also called `enumerate`.

Any chance to get that in a future release?

Regards,
Henrik

Reply via email to