Hello Vitaly, the TREEIFY_THRESHOLD in HashMap is 8 (6 for UNTREEIFY). Not sure if there have been benchmarks for this, I dont see an justification in the source comments. There is a comment, that it is expected to waste a factor of two time and space when not compareable.
I would also expect that larger thresholds make sense (especially when key is not compareable). The JEP 180 also does not mention results from the benchmarks, maybe somebody knows details? http://openjdk.java.net/jeps/180 Gruss Bernd Am Thu, 8 Jan 2015 18:38:53 -0500 schrieb Vitaly Davidovich <vita...@gmail.com>: > Hmmm, 15 entries doesn't seem like a large enough number to drop > linear search. I'd have expected something like 60-80 entries (in > some binary vs linear search benchmarks I've come across, that seems > to be crossover point). It's hard to beat linear search for small > sets when the comparison function is dirt cheap. > > Sent from my phone > On Jan 8, 2015 6:25 PM, "Bernd Eckenfels" <e...@zusammenkunft.net> > wrote: > > > Hello Peter, > > > > hm not sure what you mean, i was not suggesting the regression is > > caused by simpler hashCode bits. (do you mean my comment about the > > removed randomizer? that is not a problem for the benchmark, but it > > might be an opportunity for DOS (again)). > > > > I think the regression itself is caused by the fact that a tree is > > used instead of the linear search. The benchmark does colide > > heavily, but only places 15 or so entries in one bucket. This is > > enough to trigger the migration from linear search to tree, but not > > enough to hurt for linear search performance (at least I think this > > is the case). > > > > The way the nodes are constructed they actually do sort pretty good > > if compareable is implemented. It would most likely not help if > > Compareable cannot distinguish more than hashCode would. > > > > Gruss > > Bernd > > > > > > > > Am Thu, 08 Jan 2015 23:10:10 +0100 > > schrieb Peter Levart <peter.lev...@gmail.com>: > > > > > Bernd, > > > > > > I tried to change the "comparableClassFor" myself and it didn't > > > work (HashMap is used very early in boot-up sequence and > > > initializing ClassValue at that time triggers a NPE). > > > > > > Anyway, caching of "comparableClassFor" result would only > > > potentially improve the "badDistWithComp" result. But can not > > > improve "badDistNoComp" which is the one with speed regression as > > > you're benchmark suggests. > > > > > > But your feeling that this is caused by "simpler" hashCode bits > > > spreading function is not correct. I tried to replace the hash() > > > method with the one that was in HM before and I get comparable > > > results. This is the JDK8 HashMap.hash() method: > > > > > > static final int hash(Object key) { > > > int h; > > > return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> > > > 16); } > > > > > > Benchmark (initialSize) Mode > > > Samples Score Score error Units > > > j.t.HashMapCollision.badDistNoComp 16 avgt 4 > > > 3171.264 1152.995 ms/op > > > j.t.HashMapCollision.badDistWithComp 16 avgt 4 > > > 2819.342 422.861 ms/op > > > j.t.HashMapCollision.goodDistNoComp 16 avgt 4 > > > 1026.064 72.049 ms/op > > > j.t.HashMapCollision.goodDistWithComp 16 avgt 4 > > > 1025.312 39.858 ms/op > > > > > > > > > ...and this is my re-interpretation of pre JDK8 HashMap.hash(): > > > > > > > > > static final int randomHash = > > > mix32(System.currentTimeMillis() ^ System.nanoTime()); > > > > > > private static int mix32(long z) { > > > z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; > > > return (int)(((z ^ (z >>> 33)) * 0xc4ceb9fe1a85ec53L) >>> > > > 32); } > > > > > > static final int hash(Object k) { > > > int h = k == null ? randomHash : randomHash ^ > > > k.hashCode(); > > > > > > // This function ensures that hashCodes that differ only > > > by // constant multiples at each bit position have a bounded > > > // number of collisions (approximately 8 at default load > > > factor). h ^= (h >>> 20) ^ (h >>> 12); > > > return h ^ (h >>> 7) ^ (h >>> 4); > > > } > > > > > > Benchmark (initialSize) Mode > > > Samples Score Score error Units > > > j.t.HashMapCollision.badDistNoComp 16 avgt 4 > > > 3257.348 1079.088 ms/op > > > j.t.HashMapCollision.badDistWithComp 16 avgt 4 > > > 2866.740 414.687 ms/op > > > j.t.HashMapCollision.goodDistNoComp 16 avgt 4 > > > 1041.068 99.370 ms/op > > > j.t.HashMapCollision.goodDistWithComp 16 avgt 4 > > > 1041.653 53.925 ms/op > > > > > > > > > Your benchmark does not show much difference. Perhaps the > > > regression for "badDistNoComp" case could be caused by the fact > > > that with really bad hashCode and no Comparable interface, the > > > red-black tree becomes less performant to search than a simple > > > linked list of Nodes... > > > > > > > > > Regards, Peter > > > > > > > > > > > > > > > > > > On 01/08/2015 08:41 PM, Bernd Eckenfels wrote: > > > > Hello Peter, > > > > > > > > yes it is only keys without an Compareable interface, but they > > > > are quite common. I think the main problem with the internal > > > > comparator (based on instance identity) is, that it would work > > > > for looking up the same instance again, but not for the case > > > > where the actual instance is re-created (as in my example code). > > > > > > > > I would love to test your modified code, but I don't have a > > > > OpenJDK build environment handy. Or actually I can try to get > > > > one, is there somewhere a Virtulisation or Cloud Image > > > > available which is pre-installed? > > > > > > > > I have (1.6) a compiled benchmark.jar here, in case anyone > > > > wants to try it: > > > > > > > > > > https://onedrive.live.com/redir?resid=A98B6F4E09966AFD!20440&authkey=!AFFk03-5jq21Xz0&ithint=file%2cjar > > > > > > > > BTW: I am (as usual) not expecting commoents on that, but just > > > > to mention it: the expected good behavior of degenerated > > > > hashmaps (due to the tree) was reason for removing the hashcode > > > > secret randomization. I wonder if that was such a good idea if > > > > colliding lookups (with more than a handfull of entries in a > > > > bucket) have this 50% penalty. > > > > > > > > Greetings > > > > Bernd > > > > > > > > > > > > > > > > Am Thu, 08 Jan 2015 20:22:13 +0100 > > > > schrieb Peter Levart <peter.lev...@gmail.com>: > > > > > > > >> Hi Bernd, > > > >> > > > >> It seems that only bad hash codes (without comparable keys) in > > > >> JDK8 HM are worse than JDK7 HM. > > > >> > > > >> > > > >> Since you have already taken time to measure JDK7 vs JDK8 HM, > > > >> could you try to take the JDK8 source and just replace the > > > >> internal "comparableClassFor" method with the following > > > >> implementation: > > > >> > > > >> static final ClassValue<Boolean> selfComparable = new > > > >> ClassValue<Boolean>() { > > > >> @Override > > > >> protected Boolean computeValue(Class<?> c) { > > > >> Type[] as; ParameterizedType p; > > > >> for (Type t : c.getGenericInterfaces()) { > > > >> if (t instanceof ParameterizedType && > > > >> (p = (ParameterizedType) t).getRawType() > > > >> == Comparable.class && > > > >> (as = p.getActualTypeArguments()).length > > > >> == 1 && as[0] == c) // type arg is c > > > >> return true; > > > >> } > > > >> return false; > > > >> } > > > >> }; > > > >> > > > >> static Class<?> comparableClassFor(Object x) { > > > >> if (x instanceof Comparable) { > > > >> Class<?> c = x.getClass(); > > > >> if (c == String.class || selfComparable.get(c)) { > > > >> return c; > > > >> } > > > >> } > > > >> return null; > > > >> } > > > >> > > > >> > > > >> ...and retry your measurements. I just want to see if it has > > > >> any impact. > > > >> > > > >> > > > >> Thanks, Peter > > > >> > > > >> > > > >> On 01/08/2015 05:38 PM, Bernd Eckenfels wrote: > > > >>> Hello, > > > >>> > > > >>> I think it was topic before, but I just wanted to point out, > > > >>> that it is still an topic on the internetz. :) > > > >>> > > > >>> Motivated by a StackOverflow question regarding HashMap > > > >>> performance regression in Java 8 > > > >>> > > > >>> > > http://stackoverflow.com/questions/27759527/using-java-7-hashmap-in-java-8/27760442 > > > >>> > > > >>> I made a JMH test and compared 7 and 8 speed. (the test is not > > > >>> very scientific as I dont really know how to squeeze the > > > >>> longrunning loop which alters state into the harness, but the > > > >>> results seem to be consitent wth theory and stopwatch testing) > > > >>> > > > >>> https://gist.github.com/ecki/9f69773eb29428a36077 > > > >>> > > > >>> What can be seen is, that with a good distribution of hash > > > >>> keys 8 looks faster than 7, and with a bad distribution of > > > >>> hash keys Java 7 is faster (unless you supply a Comparator > > > >>> for the key). (and a good distributed hashkey with comparable > > > >>> seems to be a bit slower) > > > >>> > > > >>> I think the regression is somewhat expected, but I guess its > > > >>> not widely known. > > > >>> > > > >>> (I do not use a cached hashcode, but it has a nearly trivial > > > >>> implementation just to make it more life like. the tests also > > > >>> compares different initial sizes, but they do not have an > > > >>> measurable effect on the performance, I show only default size > > > >>> below:) > > > >>> > > > >>> java version "1.7.0_72" > > > >>> > > > >>> Benchmark (initialSize) Mode Samp Score > > > >>> Error Units n.e.j.h.HashMapCollision.badDistNoComp 16 avgt > > > >>> 4 10847,318 ± 5596,690 ms/op > > > >>> n.e.j.h.HashMapCollision.badDistWithComp 16 avgt 4 > > > >>> 10761,430 ± 5376,975 ms/op > > > >>> n.e.j.h.HashMapCollision.goodDistNoComp 16 avgt 4 > > > >>> 3613,923 ± 254,823 ms/op > > > >>> n.e.j.h.HashMapCollision.goodDistWithComp 16 avgt 4 > > > >>> 3656,229 ± 274,350 ms/op java version "1.8.0_25" > > > >>> > > > >>> Benchmark (initialSize) Mode Samp Score > > > >>> Error Units n.e.j.h.HashMapCollision.badDistNoComp 16 avgt > > > >>> 4 14309,880 ± 1811,709 ms/op <-slower > > > >>> n.e.j.h.HashMapCollision.badDistWithComp 16 avgt 4 > > > >>> 8232,037 ± 5974,530 ms/op > > > >>> n.e.j.h.HashMapCollision.goodDistNoComp 16 avgt 4 > > > >>> 3304,698 ± 116,866 ms/op > > > >>> n.e.j.h.HashMapCollision.goodDistWithComp 16 avgt 4 > > > >>> 3425,762 ± 107,659 ms/op > > > >>> > > > >>> > > > >>> Greetings > > > >>> Bernd > > > > > > > > >