On Sat, 10 Apr 2021 14:13:55 GMT, Vladimir Sitnikov <vsitni...@openjdk.org> wrote:
>> With the introduction of `toList()`, preserving the SIZED characteristics in >> more cases becomes more important. This patch preserves SIZED on `skip()` >> and `limit()` operations, so now every combination of >> `map/mapToX/boxed/asXyzStream/skip/limit/sorted` preserves size, and >> `toList()`, `toArray()` and `count()` may benefit from this. E. g., >> `LongStream.range(0, 10_000_000_000L).skip(1).count()` returns result >> instantly with this patch. >> >> Some microbenchmarks added that confirm the reduced memory allocation in >> `toList()` and `toArray()` cases. Before patch: >> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm 10000 >> thrpt 10 40235,534 ± 0,984 B/op >> ref.SliceToList.seq_limit:·gc.alloc.rate.norm 10000 >> thrpt 10 106431,101 ± 0,198 B/op >> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm 10000 >> thrpt 10 106544,977 ± 1,983 B/op >> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm 10000 >> thrpt 10 40121,878 ± 0,247 B/op >> value.SliceToArray.seq_limit:·gc.alloc.rate.norm 10000 >> thrpt 10 106317,693 ± 1,083 B/op >> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm 10000 >> thrpt 10 106430,954 ± 0,136 B/op >> >> After patch: >> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm 10000 >> thrpt 10 40235,648 ± 1,354 B/op >> ref.SliceToList.seq_limit:·gc.alloc.rate.norm 10000 >> thrpt 10 40355,784 ± 1,288 B/op >> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm 10000 >> thrpt 10 40476,032 ± 2,855 B/op >> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm 10000 >> thrpt 10 40121,830 ± 0,308 B/op >> value.SliceToArray.seq_limit:·gc.alloc.rate.norm 10000 >> thrpt 10 40242,554 ± 0,443 B/op >> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm 10000 >> thrpt 10 40363,674 ± 1,576 B/op >> >> Time improvements are less exciting. It's likely that inlining and >> vectorizing dominate in these tests over array allocations and unnecessary >> copying. Still, I notice a significant improvement in SliceToArray.seq_limit >> case (2x) and mild improvement (+12..16%) in other slice tests. No >> significant change in parallel execution time, though its performance is >> much less stable and I didn't run enough tests. >> >> Before patch: >> Benchmark (size) Mode Cnt Score Error >> Units >> ref.SliceToList.par_baseline 10000 thrpt 30 14876,723 ± 99,770 >> ops/s >> ref.SliceToList.par_limit 10000 thrpt 30 14856,841 ± 215,089 >> ops/s >> ref.SliceToList.par_skipLimit 10000 thrpt 30 9555,818 ± 991,335 >> ops/s >> ref.SliceToList.seq_baseline 10000 thrpt 30 23732,290 ± 444,162 >> ops/s >> ref.SliceToList.seq_limit 10000 thrpt 30 14894,040 ± 176,496 >> ops/s >> ref.SliceToList.seq_skipLimit 10000 thrpt 30 10646,929 ± 36,469 >> ops/s >> value.SliceToArray.par_baseline 10000 thrpt 30 25093,141 ± 376,402 >> ops/s >> value.SliceToArray.par_limit 10000 thrpt 30 24798,889 ± 760,762 >> ops/s >> value.SliceToArray.par_skipLimit 10000 thrpt 30 16456,310 ± 926,882 >> ops/s >> value.SliceToArray.seq_baseline 10000 thrpt 30 69669,787 ± 494,562 >> ops/s >> value.SliceToArray.seq_limit 10000 thrpt 30 21097,081 ± 117,338 >> ops/s >> value.SliceToArray.seq_skipLimit 10000 thrpt 30 15522,871 ± 112,557 >> ops/s >> >> After patch: >> Benchmark (size) Mode Cnt Score Error >> Units >> ref.SliceToList.par_baseline 10000 thrpt 30 14793,373 ± 64,905 >> ops/s >> ref.SliceToList.par_limit 10000 thrpt 30 13301,024 ± 1300,431 >> ops/s >> ref.SliceToList.par_skipLimit 10000 thrpt 30 11131,698 ± 1769,932 >> ops/s >> ref.SliceToList.seq_baseline 10000 thrpt 30 24101,048 ± 263,528 >> ops/s >> ref.SliceToList.seq_limit 10000 thrpt 30 16872,168 ± 76,696 >> ops/s >> ref.SliceToList.seq_skipLimit 10000 thrpt 30 11953,253 ± 105,231 >> ops/s >> value.SliceToArray.par_baseline 10000 thrpt 30 25442,442 ± 455,554 >> ops/s >> value.SliceToArray.par_limit 10000 thrpt 30 23111,730 ± 2246,086 >> ops/s >> value.SliceToArray.par_skipLimit 10000 thrpt 30 17980,750 ± 2329,077 >> ops/s >> value.SliceToArray.seq_baseline 10000 thrpt 30 66512,898 ± 1001,042 >> ops/s >> value.SliceToArray.seq_limit 10000 thrpt 30 41792,549 ± 1085,547 >> ops/s >> value.SliceToArray.seq_skipLimit 10000 thrpt 30 18007,613 ± 141,716 >> ops/s >> >> I also modernized SliceOps a little bit, using switch expression (with no >> explicit default!) and diamonds on anonymous classes. > > test/jdk/java/util/stream/test/org/openjdk/tests/java/util/stream/CountTest.java > line 195: > >> 193: assertEquals(Stream.of(1, 2, 3, 4).peek(e -> >> ai.getAndIncrement()) >> 194: .parallel().skip(1).limit(2).skip(1).count(), 1); >> 195: assertEquals(ai.get(), 0); > > Does it make sense to extract the method here and launch it with different > input streams? Well, it's not that easy as there are tests for IntStream, LongStream, DoubleStream and object Stream, so they cannot be generalized under the same method. Still extracting four methods reduces some amount of repetition, so I'll do this. ------------- PR: https://git.openjdk.java.net/jdk/pull/3427