Thanks for the feedback. I really like Brian's suggestion for a default Collector.supplier(sizeEstimate), since exposing the full stream characteristics is overkill.
I don't think the use of this improvement is limited to pre-sizing ArrayLists, though. Some other Collectors that come to mind are StringJoiner (Collectors.joining) and my FutureJoiner [1], which collects a stream of CompletableFuture<T> into a CompletableFuture<List<T>>. And even if Collectors.toList was the only collector to benefit from pre-sizing, I think it is still worth the effort, as this is perhaps the most used Collector. [1]: https://gist.github.com/AugustNagro/e92c48e39f891f16500f36c33839383a On Sun, Feb 24, 2019 at 8:51 AM Brian Goetz <[email protected]> wrote: > We did consider this problem when designing the Collector API; for > example, it would have been nice if we could have a `toArray()` > collector that had all the optimizations of `Stream::toArray`. > > When we looked into it, we found a number of concerning details that > caused us to turn back (many of which you've already identified), such > as the difficulty of managing parallelism, the intrusion into the API, > etc. What we found is that all this additional complexity was basically > in aid of only a few use cases -- such as collecting into a pre-sized > ArrayList. Where are the next hundred use cases for such a mechanism, > that would justify this incremental API complexity? We didn't see them, > but maybe there are some. > > A less intrusive API direction might be a version of Collector whose > supplier function took a size-estimate argument; this might even help in > parallel since it allows for intermediate results to start with a better > initial size guess. (And this could be implemented as a default that > delegates to the existing supplier.) Still, not really sure this > carries its weight. > > > The below code returns false, for example (is this > > a bug?): > > > > Stream.of(1,2,3).parallel().map(i -> > > i+1).spliterator().hasCharacteristics(Spliterator.CONCURRENT) > > Not a bug. The `Stream::spliterator` method (along with `iterator`) is > provided as an "escape hatch" for operations that need to get at the > elements but which cannot be easily expressed using the Stream API. > This method makes a good-faith attempt to propagate a reasonable set of > characteristics (for a stream with no intermediate ops, it does delegate > to the underlying source for its spliterator), but, given that `Stream` > is not in fact a data structure, when there is nontrivial computation on > the actual source, a relatively bare-bones spliterator is provided. > > While this could probably be improved in specific cases, the return on > effort (and risk) is likely to be low, because `Stream::spliterator` is > already an infrequently used method, and it would only matter in a small > fraction of those cases. So you're in "corner case of a corner case" > territory. > > > On 2/23/2019 5:27 PM, August Nagro wrote: > > Calling Stream.collect(Collector) is a popular terminal stream operation. > > But because the collect methods provide no detail of the stream's > > characteristics, collectors are not as efficient as they could be. > > > > For example, consider a non-parallel, sized stream that is to be > collected > > as a List. This is a very common case for streams with a Collection > source. > > Because of the stream characteristics, the Collector.supplier() could > > initialize a list with initial size (since the merging function will > never > > be called), but the current implementation prevents this. > > > > I should note that the characteristics important to collectors are those > > defined by Spliterator, like: Spliterator::characteristics, > > Spliterator::estimateSize, and Spliterator::getExactSizeIfKnown. > > > > One way this enhancement could be implemented is by adding a method > > Stream.collect(Function<ReadOnlySpliterator, Collector> > collectorBuilder). > > ReadOnlySpliterator would implement the spliterator methods mentioned > > above, and Spliterator would be made to implement this interface. > > > > For example, here is a gist with what Collectors.toList could look like: > > https://gist.github.com/AugustNagro/e66a0ddf7d47b4f11fec8760281bb538 > > > > ReadOnlySpliterator may need to be replaced with some stream specific > > abstraction, however, since Stream.spliterator() does not return with the > > correct characteristics. The below code returns false, for example (is > this > > a bug?): > > > > Stream.of(1,2,3).parallel().map(i -> > > i+1).spliterator().hasCharacteristics(Spliterator.CONCURRENT) > > > > Looking forward to your thoughts, > > > > - August Nagro > >
