Re: RFR: 8265029: Preserve SIZED characteristics on slice operations (skip, limit)

Tagir F . Valeev Sat, 10 Apr 2021 23:58:41 -0700

On Sat, 10 Apr 2021 14:13:55 GMT, Vladimir Sitnikov <[email protected]> 
wrote:


>> With the introduction of `toList()`, preserving the SIZED characteristics in 
>> more cases becomes more important. This patch preserves SIZED on `skip()` 
>> and `limit()` operations, so now every combination of 
>> `map/mapToX/boxed/asXyzStream/skip/limit/sorted` preserves size, and 
>> `toList()`, `toArray()` and `count()` may benefit from this. E. g., 
>> `LongStream.range(0, 10_000_000_000L).skip(1).count()` returns result 
>> instantly with this patch.
>> 
>> Some microbenchmarks added that confirm the reduced memory allocation in 
>> `toList()` and `toArray()` cases. Before patch:
>> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm                    10000  
>> thrpt   10   40235,534 ±     0,984    B/op
>> ref.SliceToList.seq_limit:·gc.alloc.rate.norm                       10000  
>> thrpt   10  106431,101 ±     0,198    B/op
>> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm                   10000  
>> thrpt   10  106544,977 ±     1,983    B/op
>> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm                 10000  
>> thrpt   10   40121,878 ±     0,247    B/op
>> value.SliceToArray.seq_limit:·gc.alloc.rate.norm                    10000  
>> thrpt   10  106317,693 ±     1,083    B/op
>> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm                10000  
>> thrpt   10  106430,954 ±     0,136    B/op
>> 
>> After patch:
>> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm                    10000  
>> thrpt   10  40235,648 ±     1,354    B/op
>> ref.SliceToList.seq_limit:·gc.alloc.rate.norm                       10000  
>> thrpt   10  40355,784 ±     1,288    B/op
>> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm                   10000  
>> thrpt   10  40476,032 ±     2,855    B/op
>> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm                 10000  
>> thrpt   10  40121,830 ±     0,308    B/op
>> value.SliceToArray.seq_limit:·gc.alloc.rate.norm                    10000  
>> thrpt   10  40242,554 ±     0,443    B/op
>> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm                10000  
>> thrpt   10  40363,674 ±     1,576    B/op
>> 
>> Time improvements are less exciting. It's likely that inlining and 
>> vectorizing dominate in these tests over array allocations and unnecessary 
>> copying. Still, I notice a significant improvement in SliceToArray.seq_limit 
>> case (2x) and mild improvement (+12..16%) in other slice tests. No 
>> significant change in parallel execution time, though its performance is 
>> much less stable and I didn't run enough tests.
>> 
>> Before patch:
>> Benchmark                         (size)   Mode  Cnt      Score     Error  
>> Units
>> ref.SliceToList.par_baseline       10000  thrpt   30  14876,723 ±  99,770  
>> ops/s
>> ref.SliceToList.par_limit          10000  thrpt   30  14856,841 ± 215,089  
>> ops/s
>> ref.SliceToList.par_skipLimit      10000  thrpt   30   9555,818 ± 991,335  
>> ops/s
>> ref.SliceToList.seq_baseline       10000  thrpt   30  23732,290 ± 444,162  
>> ops/s
>> ref.SliceToList.seq_limit          10000  thrpt   30  14894,040 ± 176,496  
>> ops/s
>> ref.SliceToList.seq_skipLimit      10000  thrpt   30  10646,929 ±  36,469  
>> ops/s
>> value.SliceToArray.par_baseline    10000  thrpt   30  25093,141 ± 376,402  
>> ops/s
>> value.SliceToArray.par_limit       10000  thrpt   30  24798,889 ± 760,762  
>> ops/s
>> value.SliceToArray.par_skipLimit   10000  thrpt   30  16456,310 ± 926,882  
>> ops/s
>> value.SliceToArray.seq_baseline    10000  thrpt   30  69669,787 ± 494,562  
>> ops/s
>> value.SliceToArray.seq_limit       10000  thrpt   30  21097,081 ± 117,338  
>> ops/s
>> value.SliceToArray.seq_skipLimit   10000  thrpt   30  15522,871 ± 112,557  
>> ops/s
>> 
>> After patch:
>> Benchmark                         (size)   Mode  Cnt      Score      Error  
>> Units
>> ref.SliceToList.par_baseline       10000  thrpt   30  14793,373 ±   64,905  
>> ops/s
>> ref.SliceToList.par_limit          10000  thrpt   30  13301,024 ± 1300,431  
>> ops/s
>> ref.SliceToList.par_skipLimit      10000  thrpt   30  11131,698 ± 1769,932  
>> ops/s
>> ref.SliceToList.seq_baseline       10000  thrpt   30  24101,048 ±  263,528  
>> ops/s
>> ref.SliceToList.seq_limit          10000  thrpt   30  16872,168 ±   76,696  
>> ops/s
>> ref.SliceToList.seq_skipLimit      10000  thrpt   30  11953,253 ±  105,231  
>> ops/s
>> value.SliceToArray.par_baseline    10000  thrpt   30  25442,442 ±  455,554  
>> ops/s
>> value.SliceToArray.par_limit       10000  thrpt   30  23111,730 ± 2246,086  
>> ops/s
>> value.SliceToArray.par_skipLimit   10000  thrpt   30  17980,750 ± 2329,077  
>> ops/s
>> value.SliceToArray.seq_baseline    10000  thrpt   30  66512,898 ± 1001,042  
>> ops/s
>> value.SliceToArray.seq_limit       10000  thrpt   30  41792,549 ± 1085,547  
>> ops/s
>> value.SliceToArray.seq_skipLimit   10000  thrpt   30  18007,613 ±  141,716  
>> ops/s
>> 
>> I also modernized SliceOps a little bit, using switch expression (with no 
>> explicit default!) and diamonds on anonymous classes.
>
> test/jdk/java/util/stream/test/org/openjdk/tests/java/util/stream/CountTest.java
>  line 195:
> 
>> 193:             assertEquals(Stream.of(1, 2, 3, 4).peek(e -> 
>> ai.getAndIncrement())
>> 194:                 .parallel().skip(1).limit(2).skip(1).count(), 1);
>> 195:             assertEquals(ai.get(), 0);
> 
> Does it make sense to extract the method here and launch it with different 
> input streams?

Well, it's not that easy as there are tests for IntStream, LongStream, 
DoubleStream and object Stream, so they cannot be generalized under the same 
method. Still extracting four methods reduces some amount of repetition, so 
I'll do this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3427

Re: RFR: 8265029: Preserve SIZED characteristics on slice operations (skip, limit)

Reply via email to