Hello, Paul! Thank you. Here's issue: https://bugs.openjdk.java.net/browse/JDK-8164189 And webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8164189/r1/
With best regards, Tagir Valeev. PS> Hi Tagir, PS> I like it. Please open an issue and i will sponsor the fix for you. PS> Thanks, PS> Paul. >> On 8 Aug 2016, at 22:31, Tagir F. Valeev <amae...@gmail.com> wrote: >> >> Hello! >> >> I'd like to propose a simple performance patch for Collectors.toSet(): >> >> --- a/src/java.base/share/classes/java/util/stream/Collectors.java Tue >> Aug 02 07:19:06 2016 +0530 >> +++ b/src/java.base/share/classes/java/util/stream/Collectors.java Tue >> Aug 09 11:47:20 2016 +0700 >> @@ -295,7 +295,12 @@ >> public static <T> >> Collector<T, ?, Set<T>> toSet() { >> return new CollectorImpl<>((Supplier<Set<T>>) HashSet::new, Set::add, >> - (left, right) -> { left.addAll(right); >> return left; }, >> + (left, right) -> { >> + if (left.size() < right.size()) { >> + right.addAll(left); return right; >> + } >> + left.addAll(right); return left; >> + }, >> CH_UNORDERED_ID); >> } >> >> This will add the smaller set to the larger which may improve parallel >> performance when subtasks produced uneven results. In normal case the >> additional overhead is insignificant (additional HashSet size >> comparison per each combine operation which usually performed >> ~4*parallelism times). >> >> Here's simple JDK benchmark with results (Core-i7-4702MQ 4 HT cores, >> Win7, JDK 9-ea+129 64bit) which confirms the speedup: >> http://cr.openjdk.java.net/~tvaleev/patches/toSet/ >> >> When data is unevenly distributed, it's significantly faster: >> >> Original: >> ToSet.toSetParallel 10000 true avgt 50 88,580 ± 0,874 us/op >> ToSet.toSetParallel 100000 true avgt 50 676,243 ± 41,188 us/op >> ToSet.toSetParallel 1000000 true avgt 50 9126,399 ± 83,260 us/op >> Patched: >> ToSet.toSetParallel 10000 true avgt 50 62,121 ± 1,207 us/op >> ToSet.toSetParallel 100000 true avgt 50 556,968 ± 37,404 us/op >> ToSet.toSetParallel 1000000 true avgt 50 6572,372 ± 183,466 us/op >> >> When data is evenly distributed, no statistically significant changes >> observed: >> >> Original: >> ToSet.toSetParallel 10000 false avgt 50 177,137 ± 5,941 >> us/op >> ToSet.toSetParallel 100000 false avgt 50 2056,332 ± 87,433 >> us/op >> ToSet.toSetParallel 1000000 false avgt 50 51864,198 ± 560,883 >> us/op >> Patched: >> ToSet.toSetParallel 10000 false avgt 50 172,606 ± 7,321 us/op >> ToSet.toSetParallel 100000 false avgt 50 1922,478 ± 79,593 us/op >> ToSet.toSetParallel 1000000 false avgt 50 52648,100 ± 621,526 us/op >> >> What do you think? Should I open an issue? >> >> With best regards, >> Tagir Valeev. >>