Hello, Brian! On Mon, Jan 31, 2022 at 12:01 AM Brian Goetz <[email protected]> wrote: > > Are you proposing dropping SIZED from the spliterator for arrays? This would > undermine all the array-based optimizations (e.g., toArray), which seems a > bad trade. I realize the splitting heuristics are frustrating for a number > of use cases, but this seems like throwing the baby out with the bathwater.
No, c'mon, I would never do this. Moreover, this behavior is covered by dozens of tests. As you can see, my change doesn't break any of the existing tests. What I'm proposing is dropping SIZED from the very-very specific arrays that are chopped out of iterators/spliterators of unknown size. Nothing else is affected. For this particular case, SIZED-ness of arrays is never used. When a top-level spliterator is neither SIZED, nor SUBSIZED (its size is completely unknown), the fact that some subspliterators are SIZED never helped to perform any optimization. We remove SIZED only in this case. So it's not a bad trade and not throwing the baby out with the bathwater. > are frustrating for a number of use cases It's quite a common use case, and people actually complain. For example: https://stackoverflow.com/q/62653471/4856258 - here a topic starter stumbled upon this problem. The answer was to specify the size explicitly but it's also possible that size is not known in advance https://youtrack.jetbrains.com/issue/IDEA-287488 - here user complains that replacing IntStream.iterate(seed, update).takeWhile(condition) with a seemingly equivalent IntStream.iterate(seed, condition, update) undermines the parallelism. We could handle this at IntelliJ IDEA side but an inspection like "Using shorter and simpler iterate() instead of seemingly equivalent iterate().takeWhile() undermines the parallelism" does not sound great and it covers only iterate() source. With best regards, Tagir Valeev. > > (Loom is coming, and that is a good time to revisit streams support for > blocking operations, which is a big part of what people complain about with > parallel streams.) > > On 1/29/2022 11:38 AM, Tagir F.Valeev wrote: > > See the bug description for details. > > I propose a simple solution. Let's allow ArraySpliterator to be non-SIZED and > report artificial estimatedSize(), much bigger than the real one. This will > allow AbstractSpliterator and IteratorSpliterator to produce prefix whose > size is comparable to Long.MAX_VALUE (say, starting with Long.MAX_VALUE/2), > and this will enable further splitting of the prefix. This change will > drastically improve parallel streaming for affected streams of size <= 1024 > and significantly improve for streams of size 1025..20000. The cost is > higher-grained splitting for huge streams of unknown size. This might add a > minor overhead for such scenarios which, I believe, is completely tolerable. > > No public API changes are necessary, sequential processing should not be > affected, except an extra field in ArraySpliterator which increases a > footprint by 8 bytes. > > I added a simple test to ensure that at least two threads are actually used > when parallelizing Stream.iterate source. More testing ideas are welcome. > > ------------- > > Commit messages: > - JDK-8280915 Better parallelization for AbstractSpliterator and > IteratorSpliterator when size is unknown > > Changes: https://git.openjdk.java.net/jdk/pull/7279/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7279&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8280915 > Stats: 128 lines in 2 files changed: 96 ins; 0 del; 32 mod > Patch: https://git.openjdk.java.net/jdk/pull/7279.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/7279/head:pull/7279 > > PR: https://git.openjdk.java.net/jdk/pull/7279 > >
