Hi Tagir,

Thanks for picking this up.

I'll be at a conference this week and I won't have much time to discuss this in detail until afterward. But here are some quick thoughts about the proposal.

I'd suggest focusing on the API first before worrying about how to track the stream state with booleans, etc. Is the API convenient to use, and how well does it support the use cases we envision for it?

In particular, I can imagine a number of cases where it would be very helpful to be able to support an empty stream, or where the computation to produce the first element is the same as the computation to produce subsequent elements. Requiring a value for the first stream element is at odds with that.

Here are some ideas for use cases to try out:

 - a series of dice rolls representing a round of craps [1]
 - elements drawn from a queue until the queue is empty or until
   a sentinel is reached
 - a sequence of numbers that (probably) terminates but whose length
   isn't necessarily known in advance (e.g. Collatz sequence [2])

[1] https://en.wikipedia.org/wiki/Craps

[2] https://en.wikipedia.org/wiki/Collatz_conjecture

Note that in some cases the sentinel value that terminates the stream should be part of the stream, and in other cases it's not.

I'm sure you can find more uses cases by perusing Stack Overflow. :-)

I'm a bit skeptical of the use of "iterate" for producing a finite stream. There are the usual issues with overloading, but there's also potential confusion as some forms of iterate() are infinite and others finite. I'll suggest the name "produce" instead, but there are surely better terms.

One thing to think about is where the state of the producer is stored. Is it expected to be in an argument that's passed to each invocation of the functional argument, or is it expected to be captured? I don't think there's an answer in isolation; examining use cases would probably shed some light here.

Here are a few API ideas (wildcards elided):

--

<T> Stream<T> iterate(T seed, Predicate<T> predicate, UnaryOperator<T> f)

The API from your proposal, for comparison purposes.

--

<T> Stream<T> produce(Supplier<Optional<T>>)

Produces elements until empty Optional is returned. This box/unboxes every element, maybe(?) alleviated by Valhalla.

--

<T> Stream<T> produce(BooleanSupplier, Supplier<T>)

Calls the BooleanSupplier; if true the next stream element is what's returned by calling the Supplier. If BooleanSupplier returns false, end of stream. If you have an iterator already, this enables

produce(iterator::hasNext, iterator::next)

But if you don't have an iterator already, coming up with the functions to satisfy the iterator-style protocol is sometimes painful.

--

<T> Stream<T> produce(Predicate<Consumer<T>> advancer)

This has an odd signature, but the function is like Spliterator.tryAdvance(). It must either call the consumer once and return true, or return false without calling the consumer.

--

<T> Stream<T> produce(Consumer<Consumer<T>> advancer)

A variation of the above, without a boolean return. The advancer calls the consumer one or more times to add elements to the stream. End of stream occurs when the advancer doesn't call the consumer.

--

<T> Stream<T> produce(Supplier<Stream<T>>)

A variation of Supplier<Optional<T>> where the supplier returns a stream containing zero or more elements. The stream terminates if the supplier returns an empty stream. There "boxing" overhead here, but we don't seem to be bothered by this with flatMap().

--

s'marks


On 2/14/16 6:53 AM, Tagir F. Valeev wrote:
Hello!

I wanted to work on foldLeft, but Brian asked me to take this issue
instead. So here's webrev:
http://cr.openjdk.java.net/~tvaleev/webrev/8072727/r1/

I don't like iterator-based Stream source implementations, so I made
them AbstractSpliterator-based. I also implemented manually
forEachRemaining as, I believe, this improves the performance in
non-short-circuiting cases.

I also decided to keep two flags (started and finished) to track the
state. Currently existing implementation of infinite iterate() does
not use started flag, but instead reads one element ahead for
primitive streams. This seems wrong to me and may even lead to
unexpected exceptions (*). I could get rid of "started" flag for
Stream.iterate() using Streams.NONE, but this would make object
implementation different from primitive implementations. It would also
be possible to keep single three-state variable (byte or int,
NOT_STARTED, STARTED, FINISHED), but I doubt that this would improve
the performance or footprint. Having two flags looks more readable to
me.

Currently existing two-arg iterate methods can now be expressed as a
partial case of the new method:

public static<T> Stream<T> iterate(final T seed, final UnaryOperator<T> f) {
     return iterate(seed, x -> true, f);
}
(same for primitive streams). I may do this if you think it's
reasonable.

I created new test class and added new iterate sources to existing
data providers.

Please review and sponsor!

With best regards,
Tagir Valeev.

(*) Consider the following code:

int[] data = {1,2,3,4,-1};
IntStream.iterate(0, x -> data[x])
          .takeWhile(x -> x >= 0)
          .forEach(System.out::println);

Currently this unexpectedly throws an AIOOBE, because
IntStream.iterate unnecessarily tries to read one element ahead.

Reply via email to