On 10/4/21 12:57, Сергей Цыпанов wrote:
in the code of HashSet(Collection) we have an optimization opportunity: public HashSet(Collection<? extends E> c) { map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16)); addAll(c); } instead of using addAll() inherited from j.u.Collection we can use c.forEach(this::add): > public HashSet(Collection<? extends E> c) { map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16)); c.forEach(this::add); } This allows to rid Iterator and use counting loops for e.g. ArrayList instead of hasNext()/next().
Well, perhaps, but this sort of low-level hackery is not IMO something we should reflexively do in the core class libraries.
We are also likely to benefit from this change in case the argument collection is synchronized as it's going to be locked only once.
That's a good point, but would a synchronized collection really lock coarsely around forEach(), rather than in add().
I've used the benchmark [1] to analyze the impact and it demonstrates measurable improvement [2]. What puzzles we a bit is the result for j.u.ArrayList. At warm-up time it demonstrates better results than at measurement time: length = 10 # Run progress: 4,44% complete, ETA 00:21:41 # Fork: 3 of 5 # Warmup Iteration 1: 134,699 ns/op # Warmup Iteration 2: 115,391 ns/op # Warmup Iteration 3: 130,110 ns/op # Warmup Iteration 4: 114,465 ns/op <---- # Warmup Iteration 5: 114,849 ns/op # Warmup Iteration 6: 115,073 ns/op # Warmup Iteration 7: 113,988 ns/op # Warmup Iteration 8: 115,301 ns/op # Warmup Iteration 9: 115,452 ns/op # Warmup Iteration 10: 124,493 ns/op <---- Iteration 1: 123,776 ns/op Iteration 2: 123,719 ns/op Iteration 3: 123,725 ns/op Iteration 4: 126,892 ns/op Iteration 5: 125,493 ns/op Iteration 6: 124,661 ns/op Iteration 7: 123,783 ns/op Iteration 8: 123,975 ns/op Iteration 9: 124,047 ns/op Iteration 10: 123,899 ns/op length = 100 # Run progress: 11,11% complete, ETA 00:20:10 # Fork: 1 of 5 # Warmup Iteration 1: 1236,695 ns/op # Warmup Iteration 2: 1069,909 ns/op # Warmup Iteration 3: 1243,852 ns/op # Warmup Iteration 4: 1059,038 ns/op <---- # Warmup Iteration 5: 1057,670 ns/op # Warmup Iteration 6: 1057,117 ns/op # Warmup Iteration 7: 1056,932 ns/op # Warmup Iteration 8: 1054,909 ns/op # Warmup Iteration 9: 1058,106 ns/op # Warmup Iteration 10: 1145,699 ns/op <---- Iteration 1: 1139,155 ns/op Iteration 2: 1141,662 ns/op Iteration 3: 1139,061 ns/op Iteration 4: 1139,306 ns/op Iteration 5: 1138,475 ns/op Iteration 6: 1140,491 ns/op Iteration 7: 1144,017 ns/op Iteration 8: 1139,411 ns/op Iteration 9: 1142,729 ns/op Iteration 10: 1144,042 ns/op Here I use 1 s warm-up iteration on 2s measurement iteration. It looks like at iteration 4 the code is top-optimized, and later at the end of warm-up a kind of deoptimization happens. There's still visible improvement however. The benchmark is run with 5 forks, and this deoptimization is visible for length = {10, 100}. So my two question are: - What is the reason of the deoptimization and can we do something about it?
C2 does this sort of thing all the time: it's a heuristic probabilistic optimizer. I've seen plenty of bimodal examples, where inlining changes and each time a program is run it's either fast or slow, quasi-randomly. If you really want to know, use -prof perfasm in JMH.
- Whether suggested changes is worth implementations?
IMO the gain is too small. Others may disagree. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671