On 10/4/21 12:57, Сергей Цыпанов wrote:


in the code of HashSet(Collection) we have an optimization opportunity:

public HashSet(Collection<? extends E> c) {
   map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
   addAll(c);
}

instead of using addAll() inherited from j.u.Collection we can use 
c.forEach(this::add):
> public HashSet(Collection<? extends E> c) {
   map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
   c.forEach(this::add);
}

This allows to rid Iterator and use counting loops for e.g. ArrayList instead 
of hasNext()/next().

Well, perhaps, but this sort of low-level hackery is not IMO something we
should reflexively do in the core class libraries.

We are also likely to benefit from this change in case the argument collection 
is synchronized as it's going to be locked only once.

That's a good point, but would a synchronized collection really lock
coarsely around forEach(), rather than in add().

I've used the benchmark [1] to analyze the impact and it demonstrates 
measurable improvement [2].

What puzzles we a bit is the result for j.u.ArrayList. At warm-up time it 
demonstrates better results than at measurement time:

  length = 10

# Run progress: 4,44% complete, ETA 00:21:41
# Fork: 3 of 5
# Warmup Iteration   1: 134,699 ns/op
# Warmup Iteration   2: 115,391 ns/op
# Warmup Iteration   3: 130,110 ns/op
# Warmup Iteration   4: 114,465 ns/op  <----
# Warmup Iteration   5: 114,849 ns/op
# Warmup Iteration   6: 115,073 ns/op
# Warmup Iteration   7: 113,988 ns/op
# Warmup Iteration   8: 115,301 ns/op
# Warmup Iteration   9: 115,452 ns/op
# Warmup Iteration  10: 124,493 ns/op  <----
Iteration   1: 123,776 ns/op
Iteration   2: 123,719 ns/op
Iteration   3: 123,725 ns/op
Iteration   4: 126,892 ns/op
Iteration   5: 125,493 ns/op
Iteration   6: 124,661 ns/op
Iteration   7: 123,783 ns/op
Iteration   8: 123,975 ns/op
Iteration   9: 124,047 ns/op
Iteration  10: 123,899 ns/op

length = 100

# Run progress: 11,11% complete, ETA 00:20:10
# Fork: 1 of 5
# Warmup Iteration   1: 1236,695 ns/op
# Warmup Iteration   2: 1069,909 ns/op
# Warmup Iteration   3: 1243,852 ns/op
# Warmup Iteration   4: 1059,038 ns/op  <----
# Warmup Iteration   5: 1057,670 ns/op
# Warmup Iteration   6: 1057,117 ns/op
# Warmup Iteration   7: 1056,932 ns/op
# Warmup Iteration   8: 1054,909 ns/op
# Warmup Iteration   9: 1058,106 ns/op
# Warmup Iteration  10: 1145,699 ns/op  <----
Iteration   1: 1139,155 ns/op
Iteration   2: 1141,662 ns/op
Iteration   3: 1139,061 ns/op
Iteration   4: 1139,306 ns/op
Iteration   5: 1138,475 ns/op
Iteration   6: 1140,491 ns/op
Iteration   7: 1144,017 ns/op
Iteration   8: 1139,411 ns/op
Iteration   9: 1142,729 ns/op
Iteration  10: 1144,042 ns/op


Here I use 1 s warm-up iteration on 2s measurement iteration. It looks like at 
iteration 4 the code is top-optimized,
and later at the end of warm-up a kind of deoptimization happens. There's still 
visible improvement however.

The benchmark is run with 5 forks, and this deoptimization is visible for 
length = {10, 100}.

So my two question are:
- What is the reason of the deoptimization and can we do something about it?

C2 does this sort of thing all the time: it's a heuristic probabilistic
optimizer. I've seen plenty of bimodal examples, where inlining changes
and each time a program is run  it's either fast or slow, quasi-randomly.

If you really want to know, use -prof perfasm in JMH.

- Whether suggested changes is worth implementations?

IMO the gain is too small. Others may disagree.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply via email to